Skip to content

Quorum calculation issues #191

@davissp14

Description

@davissp14

Context
As of right now, quorum is calculated by evaluating each registered replica within the cluster and asking it who it thinks the current primary is.

We achieve two things by doing this:

  1. Confirm that the target members are reachable.
  2. We are building quorum by confirming that the majority of the cluster agree's on who the primary is.

The reason we evaluate every member within the cluster is because the primary member is unable to know how the PRIMARY_REGION value is configured per Machine. The PRIMARY_REGION environment variable represents the region holding the primary and dictates which members are eligible for promotion. This value should only be changed when a user needs to issue a regional failover, which requires the end-user to perform an update against every Machine in their cluster with the new PRIMARY_REGION value. There are a number of ways this process could fail, which presents opportunities for inconsistency.

Example case:

This issue can be expressed in a 6 node cluster with members split evenly across two different regions.

Nodes A,B,C residing in ord with PRIMARY_REGION set to ord
Nodes D,E,F residing in iad with PRIMARY_REGION set to iad

If quorum only considered "in-region" nodes, then it would be possible for a split-brain to occur across regions with the presence of a misconfigured PRIMARY_REGION environment variable.

The problem with considering every registered member
The problem with this approach is that we start to see an increase in connection timeouts when replicas are placed in regions considerably far away from the primary region. Connection timeouts are currently set to 5 seconds for each registered member. This issue is amplified if the replica is under load. When quorum cannot be reached, then the cluster will go read-only until quorum is restored. This can lead to a bad experience for many users who want to push many of their read-replicas into distant regions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions