-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: track priority inversion in replicate queue metrics #152697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cdbe813 to
6bb7860
Compare
|
@sumeerbhola re: the original comments you had on #152624 (review). TFTR!
Agree that the high cardinality is unfortunate. But I'm not sure whether there is a better solution. We need to backport this to v24.3. It doesn’t look like it was included yet (#143536). I’m also not certain if this feature is supported by tsdump. We can consider improving this only on master in a follow up pr. For now, I'm leaning towards taking the loss on high cardinality unless we are also interested in something around tracking the priority distribution in the priority queue. Wdyt?
We only have metrics for measuring pending queue length for now cockroach/pkg/kv/kvserver/queue.go Line 333 in 3642319
|
|
I’m adding a new test, TestPriorityInversionRequeue, to try to force the suspected race condition. It fails under stress on CI. I checked the logs, and it was because if n1 removes the learner too quickly, n4 may miss it when enqueuing to the replicate queue. I’m not too concerned, since the goal is mainly to reproduce and confirm that the race could occur. Things pass under stress for me locally. We might need another testing knob. Do you think this test is worth adding? If so, I’ll continue exploring ways to delay n1’s learner removal until n4 has a chance to enqueue. |
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks good, but let's sync on whether we really meant to add the metrics tracking inversions on a per-type basis.
Thank you for adding the test - nice to see the mechanism confirmed 🔥
@tbg reviewed 1 of 1 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, 2 of 3 files at r4, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/replicate_queue_test.go line 2564 at r7 (raw file):
defer leaktest.AfterTest(t)() defer log.Scope(t).Close(t)
I'd suggest you skip this test under duress preemptively.
pkg/kv/kvserver/replicate_queue.go line 322 at r7 (raw file):
Unit: metric.Unit_COUNT, } metaReplicateQueuePriorityInversionForAddVoterCount = metric.Metadata{
I thought we discussed that given that we have (rate-limited) logging and total counters, we could do away with the "per-inversion-type" counters. Am I misremembering this?
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)
pkg/kv/kvserver/replicate_queue.go line 322 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I thought we discussed that given that we have (rate-limited) logging and total counters, we could do away with the "per-inversion-type" counters. Am I misremembering this?
😞 Don’t recall reaching a conclusion in that discussion. Might have missed some discussion points. I’ve removed it for now.
pkg/kv/kvserver/replicate_queue_test.go line 2564 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I'd suggest you skip this test under duress preemptively.
Done.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not approving yet, but approved once unstacked. 👍
@tbg reviewed 3 of 3 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
|
Rebased on top of master to pick up #152596. |
152829d to
3f06ed0
Compare
|
The last two pushes fixed a rebase mess-up and a linter warning. |
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed 7 of 7 files at r9, 2 of 5 files at r10, 3 of 3 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
Previously, replicas could be enqueued at a high priority but end up processing a lower-priority actions, causing priority inversion and unfairness to other replicas behind them that needs a repair action. This commit adds metrics to track such cases. In addition, this commit also adds metrics to track when replicas are requeued in the replicate queue due to a priority inversion from a repair action to a rebalance action.
Previously, we added priority inversion requeuing mechanism. This commit adds a unit test that forces the race condition we suspected to be happening in escalations involving priority inversion and asserts that priority inversion occurs and that the replica is correctly requeued. Test set up: 1. range’s leaseholder replica is rebalanced from one store to another. 2. new leaseholder enqueues the replica for repair with high priority (e.g. to finalize the atomic replication change or remove a learner replica) 3. before processing, the old leaseholder completes the change (exits the joint config or removes the learner). 4. when the new leaseholder processes the replica, it computes a ConsiderRebalance action, resulting in a priority inversion and potentially blocking other high-priority work.
This commit removes per-action priority inversion metrics due to their high cardinality. We already have logging in place, which should provide sufficient observability. For now, we care about is priority inversion that leads to consider rebalance and requeuing the most.
|
Rebased on master to resolve merge conflicts caused by related prs. |
|
TFTR! bors r=tbg |
Part of: #151847
Resolves: #152022
Release note: none
kvserver: track priority inversion in replicate queue metrics
Previously, replicas could be enqueued at a high priority but end up processing
a lower-priority actions, causing priority inversion and unfairness to other
replicas behind them that needs a repair action. This commit adds metrics to
track such cases. In addition, this commit also adds metrics to track when
replicas are requeued in the replicate queue due to a priority inversion from a
repair action to a rebalance action.
kvserver: add TestPriorityInversionRequeue
Previously, we added priority inversion requeuing mechanism.
This commit adds a unit test that forces the race condition we suspected to be
happening in escalations involving priority inversion and asserts that priority
inversion occurs and that the replica is correctly requeued. Test set up:
finalize the atomic replication change or remove a learner replica)
config or removes the learner).
ConsiderRebalance action, resulting in a priority inversion and potentially
blocking other high-priority work.
kvserver: delete per action priority inversion metrics
This commit removes per-action priority inversion metrics due to their high
cardinality. We already have logging in place, which should provide sufficient
observability. For now, we care about is priority inversion that leads to
consider rebalance and requeuing the most.