-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: requeue on priority inversion for replicate queue #152596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
60f2038 to
d75f145
Compare
|
@tbg @sumeerbhola Tagged you as reviewer since we discussed this and wanted to keep you in the loop. No pressure to actually review it. |
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 1 of 1 files at r1, 1 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @tbg)
pkg/kv/kvserver/replicate_queue.go line 107 at r1 (raw file):
// PriorityInversionRequeue is a setting that controls whether to requeue // replicas when their priority at enqueue time and processing time is inverted // too much (e.g. dropping from a repair cation to AllocatorConsiderRebalance).
nit: action
pkg/kv/kvserver/replicate_queue.go line 902 at r2 (raw file):
log.KvDistribution.Infof(ctx, "priority inversion during process: shouldRequeue = %t action=%s, priority=%v, enqueuePriority=%v", shouldRequeue, change.Action, change.Action.Priority(), priorityAtEnqueue)
Can this be spammy. Should it use log.EveryN?
pkg/kv/kvserver/replicate_queue.go line 904 at r2 (raw file):
shouldRequeue, change.Action, change.Action.Priority(), priorityAtEnqueue) } if PriorityInversionRequeue.Get(&rq.store.cfg.Settings.SV) {
This could be nested inside if inversion if the question about the invariant below is true.
And could structure this as:
if inversion {
...
if shouldRequeue && PriorityInversionRequeue.Get(&rq.store.cfg.Settings.SV) {
...
}
}
to avoid reading the cluster setting in the common case.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3262 at r2 (raw file):
// returns whether there was a priority inversion and whether the caller should // skip the processing of the range since the inversion is considered unfair. // Currently, we only consider the inversion as unfair if it has went from a
nit: s/went/gone/
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3264 at r2 (raw file):
// Currently, we only consider the inversion as unfair if it has went from a // repair action to lowest priority (AllocatorConsiderRebalance). We let // AllocatorRangeUnavailable, AllocatorNoop pass through since they are noop.
Is there an invariant: shouldRequeue => isInversion?
If yes, please document it.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3272 at r2 (raw file):
// a very high priority (1e5). In those cases, we do not want to requeue // these actions or count it as an inversion. if priorityAtEnqueue == -1 || !withinPriorityRange(priorityAtEnqueue) {
what is the -1 case?
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @sumeerbhola, and @tbg)
pkg/kv/kvserver/replicate_queue.go line 107 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: action
Fixed.
pkg/kv/kvserver/replicate_queue.go line 902 at r2 (raw file):
Previously, sumeerbhola wrote…
Can this be spammy. Should it use
log.EveryN?
Done for every 3 seconds.
pkg/kv/kvserver/replicate_queue.go line 904 at r2 (raw file):
Previously, sumeerbhola wrote…
This could be nested inside
if inversionif the question about the invariant below is true.
And could structure this as:if inversion { ... if shouldRequeue && PriorityInversionRequeue.Get(&rq.store.cfg.Settings.SV) { ... } }to avoid reading the cluster setting in the common case.
Good call, done.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3262 at r2 (raw file):
Previously, sumeerbhola wrote…
nit: s/went/gone/
Done.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3264 at r2 (raw file):
Previously, sumeerbhola wrote…
Is there an invariant: shouldRequeue => isInversion?
If yes, please document it.
Added a note.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3272 at r2 (raw file):
Previously, sumeerbhola wrote…
what is the -1 case?
Added a comment to explain this:
// priorityAtEnqueue of -1 is a special case reserved for processing logic to
// run even if there’s a priority inversion. If the priority is not -1, the
// range may be re-queued to be processed with the correct priority. It is
// used for things that call into baseQueue.process without going through the
// replicate queue. For example, s.ReplicateQueueDryRun or
// r.scatterRangeAndRandomizeLeases.
See more details in #152512 which plumbs these -1. This PR has been merged, but let me know if you have any comments and I can open another PR to address them.
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @tbg, and @wenyihu6)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3272 at r2 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Added a comment to explain this:
// priorityAtEnqueue of -1 is a special case reserved for processing logic to // run even if there’s a priority inversion. If the priority is not -1, the // range may be re-queued to be processed with the correct priority. It is // used for things that call into baseQueue.process without going through the // replicate queue. For example, s.ReplicateQueueDryRun or // r.scatterRangeAndRandomizeLeases.See more details in #152512 which plumbs these -1. This PR has been merged, but let me know if you have any comments and I can open another PR to address them.
Ack
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3266 at r6 (raw file):
// AllocatorRangeUnavailable, AllocatorNoop pass through since they are noop. // // NB: If shouldRequeue is true, isInversion must be true.
nit: I think it is easier for the reader to say something less verbose like
// INVARIANT: shouldRequeue => isInversion
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing some comments. Generally looks good! The priority rounding is a bit of a rangle, but I don't see how to easily avoid that (except if we allow more inversions).
| // WithinPriorityRange checks if a priority is within the range of possible | ||
| // priorities for the allocator actions. | ||
| func withinPriorityRange(priority float64) bool { | ||
| return AllocatorNoop.Priority() <= priority && priority <= AllocatorFinalizeAtomicReplicationChange.Priority() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be cleaner to add a sentinel value to the enum and to check on it as the upper bound here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consider inlining this function. It doesn't make much sense when looked at without context, and it's only used in one place. It also isn't being tested directly, nor would that make enough sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added AllocatorMaxPriority for the upper bound and added a NB on AllocatorNoop should have the lowest priority. Also put this function under CheckPriorityInversion.
| priorityAtEnqueue float64, actionAtProcessing AllocatorAction, | ||
| ) (isInversion bool, shouldRequeue bool) { | ||
| // priorityAtEnqueue of -1 is a special case reserved for processing logic to | ||
| // run even if there’s a priority inversion. If the priority is not -1, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This middle sentence ("If the priority is not -1, ...") is a bit of a non sequitur. The surrounding paragraphs explain the special case of -1. I'd just say something along the lines of
// NB: priorityAtEnqueue is -1 for callers such as scatter, dry runs, and manual queue runs.
// Priority inversion does not apply to these calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| // CheckPriorityInversion checks if the priority at enqueue time is higher than | ||
| // the priority corresponding to the action computed at processing time. It | ||
| // returns whether there was a priority inversion and whether the caller should | ||
| // skip the processing of the range since the inversion is considered unfair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it "It returns whether there was a priority inversion (and the range should not be processed at this time, since doing so could starve higher-priority items), and whether the caller should re-add the range to the queue (presumably under its new priority)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the comment.
| // AllocatorRemoveLearner, and AllocatorReplaceDeadVoter will be rounded to | ||
| // the same priority. They are so close to each other, so we don't really | ||
| // count it as an inversion among them. | ||
| normPriorityAtEnqueue := roundToNearestPriorityCategory(priorityAtEnqueue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the priority enum with a hard-to-miss message that priorities should be "separated" by multiples of 100 (explaining why) and also update the places where we adjust the priority to explain why each adjustment is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a NB comment and a regression unit test. I noticed that we are making some assumptions with this statement, but I think the assumptions should be true. (Update: After some thinking, I don't know if I can say the second assumption should be true. We can revisit them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: After some thinking, I don't know if I can say the assumptions (especially the second one) should be true. I should revisit them if strong guarantees are required. It should be fine for now since priority inversion re-queuing only care about repair action v.s. consider rebalance. I added a comment to warn new uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're saying that with sufficiently high replication factors, some of the priorities may be adjusted by more than 50, right? I can live with this, add a comment about it and let's move on.
| "kv.priority_inversion_requeue_replicate_queue.enabled", | ||
| "whether the requeue replicas should requeue when enqueued for "+ | ||
| "repair action but ended up consider rebalancing during processing", | ||
| false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we land a separate PR on master that flips this to true (so that we get some exposure), and backport only this PR (keep default value false) if we backport?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, added a TODO to remind myself after this PR lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put up #152749 which stacks on top of this PR.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg dismissed @sumeerbhola from 3 discussions.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @sumeerbhola, and @wenyihu6)
pkg/kv/kvserver/replicate_queue.go line 910 at r6 (raw file):
} if shouldRequeue && PriorityInversionRequeue.Get(&rq.store.cfg.Settings.SV) {
It would be good to backport something that is a no-op unless opted into. But this code is not a no-op, the user can only control whether to requeue or skip. Should we make the cluster setting about enabling the inversion detection mechanism as a whole, and not have a way to toggle just the requeue? That way, we can backport the thing as off, our customer can turn it on, and we also turn it on on master so that it will ship with the next release for everyone else. If the requeuing becomes a problem, we'll find out and users can toggle off the entire inversion mechanism as a temporary work-around.
pkg/kv/kvserver/replicate_queue.go line 911 at r6 (raw file):
if shouldRequeue && PriorityInversionRequeue.Get(&rq.store.cfg.Settings.SV) { // Return true here to requeue the range. We can't return an error here
We can't return an error but we return an error? Something is off here. I agree (reading the code) that requeuing has no effect if an error is returned.
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani, @sumeerbhola, and @tbg)
pkg/kv/kvserver/replicate_queue.go line 910 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It would be good to backport something that is a no-op unless opted into. But this code is not a no-op, the user can only control whether to requeue or skip. Should we make the cluster setting about enabling the inversion detection mechanism as a whole, and not have a way to toggle just the requeue? That way, we can backport the thing as off, our customer can turn it on, and we also turn it on on master so that it will ship with the next release for everyone else. If the requeuing becomes a problem, we'll find out and users can toggle off the entire inversion mechanism as a temporary work-around.
Added a commit to make PriorityInversionRequeue guard both inversion check and requeuing.
pkg/kv/kvserver/replicate_queue.go line 911 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
We can't return an error but we return an error? Something is off here. I agree (reading the code) that requeuing has no effect if an error is returned.
Stale comment, updated for clarity. Originally, I considered returning requeue = true, err = nil to allow requeuing on error, but instead chose this approach to ensure correct logging and error metric tracking in processReplica and finishProcessingReplica.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 3266 at r6 (raw file):
Previously, sumeerbhola wrote…
nit: I think it is easier for the reader to say something less verbose like
// INVARIANT: shouldRequeue => isInversion
Done.
| "kv.priority_inversion_requeue_replicate_queue.enabled", | ||
| "whether the requeue replicas should requeue when enqueued for "+ | ||
| "repair action but ended up consider rebalancing during processing", | ||
| false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, added a TODO to remind myself after this PR lands.
| // WithinPriorityRange checks if a priority is within the range of possible | ||
| // priorities for the allocator actions. | ||
| func withinPriorityRange(priority float64) bool { | ||
| return AllocatorNoop.Priority() <= priority && priority <= AllocatorFinalizeAtomicReplicationChange.Priority() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added AllocatorMaxPriority for the upper bound and added a NB on AllocatorNoop should have the lowest priority. Also put this function under CheckPriorityInversion.
| // CheckPriorityInversion checks if the priority at enqueue time is higher than | ||
| // the priority corresponding to the action computed at processing time. It | ||
| // returns whether there was a priority inversion and whether the caller should | ||
| // skip the processing of the range since the inversion is considered unfair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the comment.
| priorityAtEnqueue float64, actionAtProcessing AllocatorAction, | ||
| ) (isInversion bool, shouldRequeue bool) { | ||
| // priorityAtEnqueue of -1 is a special case reserved for processing logic to | ||
| // run even if there’s a priority inversion. If the priority is not -1, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| // AllocatorRemoveLearner, and AllocatorReplaceDeadVoter will be rounded to | ||
| // the same priority. They are so close to each other, so we don't really | ||
| // count it as an inversion among them. | ||
| normPriorityAtEnqueue := roundToNearestPriorityCategory(priorityAtEnqueue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a NB comment and a regression unit test. I noticed that we are making some assumptions with this statement, but I think the assumptions should be true. (Update: After some thinking, I don't know if I can say the second assumption should be true. We can revisit them.)
| "kv.priority_inversion_requeue_replicate_queue.enabled", | ||
| "whether the requeue replicas should requeue when enqueued for "+ | ||
| "repair action but ended up consider rebalancing during processing", | ||
| false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put up #152749 which stacks on top of this PR.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
@tbg reviewed 1 of 1 files at r1, 3 of 3 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 1 of 1 files at r9, 1 of 1 files at r10, 1 of 1 files at r11, 1 of 1 files at r12, 1 of 1 files at r13, all commit messages.
@tbg dismissed @sumeerbhola from 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @wenyihu6)
pkg/kv/kvserver/replicate_queue.go line 911 at r6 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Stale comment, updated for clarity. Originally, I considered returning
requeue = true, err = nilto allow requeuing on error, but instead chose this approach to ensure correct logging and error metric tracking inprocessReplicaandfinishProcessingReplica.
But won't requeue=true be ignored if there is an error? This gets returned up here
cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 699 to 701 in 0a31197
| if err != nil { | |
| return false, err | |
| } |
and requeue is never acted upon. So what's the point of giving the impression that requeuing will happen when it isn't actually happening? IMO we shouldn't enter this branch on error at all, an error when computing a plan is not an inversion.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 263 at r7 (raw file):
// AllocatorRemoveDeadVoter, and AllocatorRemoveVoter, the priority may be // adjusted (see ComputeAction for details), but the adjustment is expected to // be small (<100).
<49, right?
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 995 at r7 (raw file):
// NB: The returned priority may include a small adjustment and therefore might // not exactly match action.Priority(). See AllocatorAddVoter, // AllocatorRemoveDeadVoter, AllocatorRemoveVoter below. The adjustment should be <100.
<49
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 995 at r7 (raw file):
// NB: The returned priority may include a small adjustment and therefore might // not exactly match action.Priority(). See AllocatorAddVoter, // AllocatorRemoveDeadVoter, AllocatorRemoveVoter below. The adjustment should be <100.
// The adjustment should be <49 (see AllocatorAction.Priority()).
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 997 at r7 (raw file):
// AllocatorRemoveDeadVoter, AllocatorRemoveVoter below. The adjustment should be <100. // // The claim that the adjustment is < 100 has two assumptions:
49 and the other numbers below also need touching up.
| // AllocatorRemoveLearner, and AllocatorReplaceDeadVoter will be rounded to | ||
| // the same priority. They are so close to each other, so we don't really | ||
| // count it as an inversion among them. | ||
| normPriorityAtEnqueue := roundToNearestPriorityCategory(priorityAtEnqueue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're saying that with sufficiently high replication factors, some of the priorities may be adjusted by more than 50, right? I can live with this, add a comment about it and let's move on.
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @tbg)
pkg/kv/kvserver/replicate_queue.go line 911 at r6 (raw file):
won't
requeue=truebe ignored if there is an error?
I changed it so that we may requeue on errors as well in this commit:
cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 698 to 704 in d75f145
| if err != nil { | |
| if requeue { | |
| log.KvDistribution.VEventf(ctx, 1, "re-queuing on errors: %v", err) | |
| rq.maybeAdd(ctx, repl, rq.store.Clock().NowAsClockTimestamp()) | |
| } | |
| return false, err | |
| } |
requeue = true, err = nil instead. My intention was to treat priority inversion as an error (for logging and metrics in finishProcessingReplica) while still making it requeueable. Does this seem like a right decision to you?
Good call on we shouldn't enter this branch on error at all. I moved the priority inversion check down below so that we return early if there is a planning error to begin with.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 263 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
<49, right?
Good catch, done.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 995 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
<49
Fixed.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 995 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
// The adjustment should be <49 (see
AllocatorAction.Priority()).
Fixed.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 997 at r7 (raw file):
Previously, tbg (Tobias Grieger) wrote…
49 and the other numbers below also need touching up.
Fixed.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @wenyihu6)
pkg/kv/kvserver/replicate_queue.go line 911 at r6 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
won't
requeue=truebe ignored if there is an error?I changed it so that we may requeue on errors as well in this commit:
. Without this change, requeuing wouldn’t occur unless we returncockroach/pkg/kv/kvserver/replicate_queue.go
Lines 698 to 704 in d75f145
if err != nil { if requeue { log.KvDistribution.VEventf(ctx, 1, "re-queuing on errors: %v", err) rq.maybeAdd(ctx, repl, rq.store.Clock().NowAsClockTimestamp()) } return false, err } requeue = true, err = nilinstead. My intention was to treat priority inversion as an error (for logging and metrics infinishProcessingReplica) while still making it requeueable. Does this seem like a right decision to you?Good call on we shouldn't enter this branch on error at all. I moved the priority inversion check down below so that we return early if there is a planning error to begin with.
Oh, I missed that - thank you. The code change looks innocuous enough, so I'm okay with this semantic change as long as we are sure it's only getting hit for this new error return path. I'll resolve this but left a comment on the requeue logic.
pkg/kv/kvserver/replicate_queue.go line 700 at r15 (raw file):
if err != nil { if requeue {
Could this code be restructured as follows:
if requeue { ... }
if err != nil { return false, err }
if testingAggressive.... {}
// deleted the requeue branch that was here
return true, nilYou could also add a comment on why we're requeuing in both cases.
I assume you are confident we weren't requeuing on any other errors before, right? With an eye towards backporting that would be good to know for sure.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed 1 of 1 files at r14, 1 of 1 files at r15.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @wenyihu6)
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @tbg)
pkg/kv/kvserver/replicate_queue.go line 700 at r15 (raw file):
Would you be okay if I addressed this in a follow-up PR? This one is getting long, and I assume we’d prefer not to backport the refactoring. I can add the comments now if you prefer.
I assume you are confident we weren't requeuing on any other errors before, right?
Yes, this is the only place before where we return requeue = trueand it's with a nil errorcockroach/pkg/kv/kvserver/replicate_queue.go
Lines 879 to 882 in ce69299
// Otherwise, requeue to see if there is more work to do. As the // operation succeeded and was planned for a repair action i.e. not // rebalancing. requeue = true .cockroach/pkg/kv/kvserver/replicate_queue.go
Line 972 in ce69299
return ShouldRequeue(ctx, change, conf), nil
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @tbg)
pkg/kv/kvserver/replicate_queue.go line 700 at r15 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Would you be okay if I addressed this in a follow-up PR? This one is getting long, and I assume we’d prefer not to backport the refactoring. I can add the comments now if you prefer.
I assume you are confident we weren't requeuing on any other errors before, right?
Yes, this is the only place before where we return requeue = trueand it's with a nil errorcockroach/pkg/kv/kvserver/replicate_queue.go
Lines 879 to 882 in ce69299
// Otherwise, requeue to see if there is more work to do. As the // operation succeeded and was planned for a repair action i.e. not // rebalancing. requeue = true .cockroach/pkg/kv/kvserver/replicate_queue.go
Line 972 in ce69299
return ShouldRequeue(ctx, change, conf), nil
I ended up just doing the refactoring in this PR. Doesn't look too complicated to backport.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed 1 of 1 files at r16, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)
pkg/kv/kvserver/replicate_queue.go line 698 at r16 (raw file):
Just a style nit: rather than "claiming" that only a particular case hits this (which may just not be true forever and which might confuse the reader who may have no context on where or why that happens), I'd say something like
// Most requeuing operations return a nil error, but we intentionally also requeue on error, if this is requested. At the time of writing, priority inversions requeue while also returning an error.
This commit adds a new cluster setting PriorityInversionRequeue that controls whether the replicate queue should requeue replicas when their priority at enqueue time differs significantly from their priority at processing time (e.g. dropping from top 3 to the lowest priority).
Previously, a replica could enter the queue with high priority but, by the time it was processed, the action planned for this replica may have a low priority, causing us to perform low priority work. Specifically, we are mostly worried about cases where the priority changes from any of the repair actions to consider rebalance. Rebalancing could take a long time and block other ranges enqueued with actual repair action needed. This commit ensures that such replicas are requeued instead, avoiding priority inversions.
Previously, replicateQueue used V(2) to log info on priority inverted replicas because I wanted visibility into every case without missing any replicas. On reflection, the individual cases aren’t that interesting - it’s the overall volume that matters, which we can track through metrics. This commit changes it so that we just rate limit priority inversions every 3 seconds.
This commit improves the comments for PriorityInversionRequeue and clarifies the contracts around action.Priority().
This commit refactors CheckPriorityInversion.
This commit adds the TestAllocatorPriorityInvariance test, which acts as a regression safeguard when new actions are added to AllocatorAction, ensuring the contract is upheld. See action.Priority() and ComputeAction() for more details on the contract.
…equeue Previously, we introduced the PriorityInversionRequeue cluster setting, intended for backport, to handle cases where a range was enqueued with a high-priority repair action but, at processing time, a low-priority rebalance action was computed. In such cases, the caller re-adds the range to the queue under its updated priority. Although the cluster setting guards this requeue behavior, the inversion check always ran unconditionally, reducing backport safety. This commit updates the logic so that the cluster setting guard both the inversion check and the requeue behavior.
Previously, we checked for priority inversion before planning errors, which meant we could return requeue = true even when a planning error occurred. This commit changes it so that planning errors should take higher precedence over a priority inversion error. rq.processOneChange now returns early if there is a planning error and only check for priority inversion right before applying a change.
Previously, we checked for requeue right before returning for both nil and non-nil errors, making the code harder to follow. This commit refactors replicateQueue.process to requeue replicas before checking for errors.
1fcb9b8 to
b632b95
Compare
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @arulajmani and @tbg)
pkg/kv/kvserver/replicate_queue.go line 698 at r16 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Just a style nit: rather than "claiming" that only a particular case hits this (which may just not be true forever and which might confuse the reader who may have no context on where or why that happens), I'd say something like
// Most requeuing operations return a nil error, but we intentionally also requeue on error, if this is requested. At the time of writing, priority inversions requeue while also returning an error.
Good point, done.
|
TFTR! bors r=tbg |
|
👎 Rejected by code reviews |
Got Sumeer's approval to do so since Sumeer already gave his stamp but left the final approval to others.
|
TFTR! bors r=tbg |
Part of: #152022
Release note: none
kvserver: add PriorityInversionRequeue
This commit adds a new cluster setting PriorityInversionRequeue that controls
whether the replicate queue should requeue replicas when their priority at
enqueue time differs significantly from their priority at processing time
(e.g. dropping from top 3 to the lowest priority).
kvserver: requeue on priority inversion for replicate queue
Previously, a replica could enter the queue with high priority but, by the time
it was processed, the action planned for this replica may have a low priority,
causing us to perform low priority work. Specifically, we are mostly worried
about cases where the priority changes from any of the repair actions to
consider rebalance. Rebalancing could take a long time and block other ranges
enqueued with actual repair action needed. This commit ensures that such
replicas are requeued instead, avoiding priority inversions.
kvserver: use priorityInversionLogEveryN
Previously, replicateQueue used V(2) to log info on priority inverted replicas
because I wanted visibility into every case without missing any replicas. On
reflection, the individual cases aren’t that interesting - it’s the overall
volume that matters, which we can track through metrics. This commit changes it
so that we just rate limit priority inversions every 3 seconds.
kvserver: improve comments for PriorityInversionRequeue
This commit improves the comments for PriorityInversionRequeue and clarifies the
contracts around action.Priority().
allocator: small refactor for CheckPriorityInversion
This commit refactors CheckPriorityInversion.
allocator: add TestAllocatorPriorityInvariance
This commit adds the TestAllocatorPriorityInvariance test, which acts as a
regression safeguard when new actions are added to AllocatorAction, ensuring the
contract is upheld. See action.Priority() and ComputeAction() for more details
on the contract.
kvserver: guard inversion check and requeue behind PriorityInversionRequeue
Previously, we introduced the PriorityInversionRequeue cluster setting, intended
for backport, to handle cases where a range was enqueued with a high-priority
repair action but, at processing time, a low-priority rebalance action was
computed. In such cases, the caller re-adds the range to the queue under its
updated priority. Although the cluster setting guards this requeue behavior, the
inversion check always ran unconditionally, reducing backport safety. This
commit updates the logic so that the cluster setting guard both the inversion
check and the requeue behavior.
kvserver: move priority inversion check before applyChange
Previously, we checked for priority inversion before planning errors, which
meant we could return requeue = true even when a planning error occurred. This
commit changes it so that planning errors should take higher precedence over a
priority inversion error. rq.processOneChange now returns early if there is a
planning error and only check for priority inversion right before applying a
change.
kvserver: check for requeue before error checking in rq.process
Previously, we checked for requeue right before returning for both nil and
non-nil errors, making the code harder to follow. This commit refactors
replicateQueue.process to requeue replicas before checking for errors.