-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ring: Fix pathological case when an entire zone leaves #672
Conversation
Comparison against a commit from early January which predates my changes in #632:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen ring code in a while, and originally I thought I've found a bug in the PR, but staring at it for some more time, I'm wrong and the change seems legit and makes sense to me.
This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work. This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need _OR_ we have examined all hosts in the zone and so know that we won't find one. Signed-off-by: Nick Pillitteri <[email protected]>
67a03bf
to
16780f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me, not super familiar with ring code though
Specifically, this pulls in the following dskit PRs: * grafana/dskit#672 * grafana/dskit#669 * grafana/dskit#668 Signed-off-by: Nick Pillitteri <[email protected]>
Update to the latest dskit commmit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <[email protected]>
Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <[email protected]>
Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <[email protected]>
* chore: update to latest dskit for ring performance fix Update to the latest dskit commit to pull in grafana/dskit#672 which improves performance of ring operations in clients (queriers, distributors) when zone-awareness is enabled and an entire zone is not "ACTIVE". Signed-off-by: Nick Pillitteri <[email protected]> * Lint Signed-off-by: Nick Pillitteri <[email protected]> --------- Signed-off-by: Nick Pillitteri <[email protected]>
What this PR does:
This change improves performance in a the case where an entire zone is not ACTIVE and the replication set is meant to be extended. Previously, when an entire zone was unavailable, the ring kept searching for instances by looking at every single token trying to find an instance in the required zone that was ACTIVE. This meant thousands of iterations to find a host that would never work.
This change keeps track of the number of hosts that we have examined in each zone. It returns early once we have either found the hosts in each zone we need OR we have examined all hosts in the zone and so know that we won't find one.
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]