-
Notifications
You must be signed in to change notification settings - Fork 4k
mmaprototype: extract rebalance{Leases,Replicas} from rebalanceStore
#157761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
rebalanceLeases from rebalanceStorerebalance{Leases,Replicas} from rebalanceStore
7a0f2b4 to
4d15fdc
Compare
Wrap the lease rebalancing logic in an explicit block scope to prepare for extraction into a separate method. This is a pure whitespace change with no logic modifications. This change makes the code that will become `considerRebalanceLeases` clearly delineated, setting up for the subsequent refactoring steps.
Add a local counter to track lease transfers made during processing of a single store, and use it to fix the bug where the global counter was incorrectly used to determine if this store transferred leases. The local counter is incremented alongside the global counter, which is still needed for loop termination across all stores. The check at the end of lease rebalancing now correctly uses the local counter to decide whether to skip replica transfers for this specific store.
Wrap the lease rebalancing logic in an inline function that takes all necessary parameters and returns whether to skip replica moves. This prepares the code for extraction to a separate method. The function converts early returns to return statements with the skip flag, making the control flow explicit and ready for method extraction.
Extract the lease rebalancing logic into a separate method `(*rebalanceState).rebalanceLeases`. This improves code organization by separating the lease rebalancing concern from the main rebalance logic. The method takes all necessary parameters and returns whether to skip replica moves. The extraction was done mechanically by the author using IDE tooling.
Add a new code block (braces) around the replica rebalancing logic in rebalanceStore to prepare for extraction into a separate method. This is a mechanical whitespace change that will be followed by further refactoring steps.
…block Move the doneShedding variable declaration from the beginning of rebalanceStore into the code block that contains the replica rebalancing logic. This narrows the variable's scope to where it's actually used and prepares for extracting this code into a separate method.
Wrap the replica rebalancing logic in an immediately invoked function (IIF) and add a call to a new rebalanceReplicas method (currently a no-op). This sets up both the call site and method signature, making it straightforward to move the code into the method in the next step.
…s method Move the replica rebalancing logic from rebalanceStore into a new rebalanceReplicas method. The code was moved manually using IDE tooling to ensure accurate extraction. This completes the refactoring to separate replica rebalancing from the main rebalanceStore flow.
Remove the `now time.Time` parameter from `rebalanceLeases` and use `rs.now` instead. This completes the refactoring to store `now` in `rebalanceState` rather than passing it as a parameter. The method now uses `rs.now` for checking the delay after failed changes. The call site in `rebalanceStore` no longer passes the `now` parameter. This is the final step in removing the `now` parameter from all rebalance methods.
Move `ctx context.Context` to the first parameter position in `rebalanceStore` and `rebalanceLeases` to follow Go conventions. All rebalance methods in this file now consistently have `ctx` as the first parameter. Updated all call sites accordingly.
Wenyi correctly pointed out that (rs rebalanceState) is too reminiscent of (rs rangeState) (re rebalanceEnv) has no such problem. Epic: CRDB-55052
It's referenced a lot, so this declutters the code.
4d15fdc to
d236fa5
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed a self-review from this - would get my LGTM, but I'm (kinda sorta, really LLM wrote most of the code, I just told it what to do) the author, so appreciate an independent look.
Auto-borsing ok, but probably this will fail some lint or some such.
Reviewable status:
complete! 0 of 0 LGTMs obtained
-- commits line 77 at r9:
I had this carried out in multiple steps but it looks like it made just one commit but still somehow assumed the user could know about the intermediate steps. Sigh. Good enough I guess. Should've caught that during commit msg review.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 378 at r2 (raw file):
rs.cs.addPendingRangeChange(leaseChange) rs.changes = append(rs.changes, leaseChange) rs.leaseTransferCount++
We can probably remove these "global" counters (this one and the one for replicas). I'll leave this like this for now, just pointing it out.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 240 at r3 (raw file):
topKRanges := ss.adjusted.topKRanges[localStoreID] n := topKRanges.len() doneShedding := false
This is a bit tricky. There's a doneShedding var that this shadows (in line 200 above), that var was previously used for both lease and the replica rebalancing code (which is still below, near the end of the method). I think this is correct though but I'll check on the final state just to make sure.
edit: yeah this is fine. This local doneShedding moves into rebalanceLeases, so we have a doneShedding left here that's only used for replica transfers, and moves with it to its own method subsequently. r6 carries out that move.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 393 at r3 (raw file):
if rs.leaseTransferCount >= rs.maxLeaseTransferCount { log.KvDistribution.VInfof(ctx, 2, "reached max lease transfer count %d, returning", rs.maxLeaseTransferCount) break
this breaks for i := 0; ..., i.e. the topKranges iteration, so that we fall through to the end of the function and can still hit the logging.
I don't love it, but will do for now.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 412 at r3 (raw file):
log.KvDistribution.VInfof(ctx, 2, "skipping replica transfers for s%d: done shedding=%v, lease_transfers=%d", store.StoreID, doneShedding, localLeaseTransferCount) return true
// shouldSkipReplicaMoves
(haven't gotten the LLM to realize when such comments are helpful yet).
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the self-review comments helpful for confirming my understanding of the commit by commit changes.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
-- commits line 12 at r2:
Wanted to check my understanding - is it correct that this bug existed prior to these cleanups - we were incorrectly using this as a global counter
cockroach/pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go
Line 353 in f945c0d
| if doneShedding || leaseTransferCount > 0 { |
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, will keep doing this then! Somehow I also get something out of reviewing it on reviewable, even though I already review it in the IDE. Somehow it's hard to review in the same detail in Cursor, I'm not exactly sure why.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @wenyihu6)
Previously, wenyihu6 (Wenyi Hu) wrote…
Wanted to check my understanding - is it correct that this bug existed prior to these cleanups - we were incorrectly using this as a global counter
? I ran the tests with --rewrite and no asim changes. Were we just not hitting thiscockroach/pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go
Line 353 in f945c0d
if doneShedding || leaseTransferCount > 0 {
Yes, these were existing bugs (both for lease transfer count and replica transfer count). I'll start writing tests for rebalanceStores soon and am planning on looking at coverage.
|
Build succeeded: |
Leases:
This PR refactors
rebalanceStoreby extracting the lease rebalancing logic into a standalonerebalanceLeasesmethod onrebalanceState. The method encapsulates the logic for handling lease transfers when a local store is CPU overloaded, improving code organization and reducing the complexity ofrebalanceStore.As part of this refactor, we also fix a bug where the global
leaseTransferCountwas incorrectly used to determine if the current store transferred leases. The fix introduces a local counter that tracks lease transfers made during processing of a single store. It feels a bit sad to fix a bug without adding a test, but this felt too good to pass up, and the refactors are precursors to plenty of testing in the context of adding a notion of replica disposition.The refactoring was done incrementally in small, reviewable commits:
There should be zero behavior changes in this work (aside from the bug fix).
Replicas:
This PR refactors
rebalanceStoreby extracting the replica rebalancing logic into a standalonerebalanceReplicasmethod onrebalanceState. This separation improves code organization by isolating the replica movement logic from the lease rebalancing logic, making the codebase easier to understand and maintain.The refactoring was done incrementally in small, reviewable commits:
doneSheddingvariable declaration into the blockrebalanceReplicasmethod (done manually using IDE tooling)There should be zero behavior changes in this PR.
Informs #157757.
Epic: CRDB-55052