better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance

The vulcan cachers need to keep a configurable window of metrics in memory for the partitions they are responsible for (e.g. 4 hours). Backfilling this window takes time, so when a new vulcan cacher comes online (or goes offline) and the group membership changes, partitions are reshuffled and are assigned based on the kafka protocol.

We have tried a simple HashRing protocol so that reassignments are minimal when cachers come and go. But, the HashRing does little to ensure that each cacher is evenly balanced.

The RoundRobin is better than HashRing right now since each cacher can operate with similar performance since partitions are evenly distributed amongst online cachers. However, when a new cacher comes online (or goes offline) the topics are reassigned with no regard for minimizing partition ownership changes.

With both RoundRobin and HashRing, we do not have redundancy. If a cacher goes away, the partitions that it owned will be re-assigned to alive cachers, but it will take a while for the alive cachers to backfill the window of data they need to actually serve queries for that partition.

Ideally, we can have a kafka protocol that ensures a partition is handled by more than one cacher and when a new cacher comes online, or goes offline, it minimizes the cacher-partition assignment changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions