Skip to content
This repository was archived by the owner on Nov 9, 2017. It is now read-only.
This repository was archived by the owner on Nov 9, 2017. It is now read-only.

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

@supershabam

Description

@supershabam

The vulcan cachers need to keep a configurable window of metrics in memory for the partitions they are responsible for (e.g. 4 hours). Backfilling this window takes time, so when a new vulcan cacher comes online (or goes offline) and the group membership changes, partitions are reshuffled and are assigned based on the kafka protocol.

We have tried a simple HashRing protocol so that reassignments are minimal when cachers come and go. But, the HashRing does little to ensure that each cacher is evenly balanced.

The RoundRobin is better than HashRing right now since each cacher can operate with similar performance since partitions are evenly distributed amongst online cachers. However, when a new cacher comes online (or goes offline) the topics are reassigned with no regard for minimizing partition ownership changes.

With both RoundRobin and HashRing, we do not have redundancy. If a cacher goes away, the partitions that it owned will be re-assigned to alive cachers, but it will take a while for the alive cachers to backfill the window of data they need to actually serve queries for that partition.

Ideally, we can have a kafka protocol that ensures a partition is handled by more than one cacher and when a new cacher comes online, or goes offline, it minimizes the cacher-partition assignment changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions