-
Notifications
You must be signed in to change notification settings - Fork 4k
mmaintegration: add mmaintegration datadriven tests #151510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit adds allocator_op.go and allocator_sync.go, introducing trackedAllocatorChange and basic coordination helpers for allocator sync. These helpers register changes before they are applied and post apply the effect on store pool after the change is applied. Note that allocator sync is not plumbed or used yet. Future commits would plumb it to replicate queue, lease queue, and mma store rebalancer to coordinate changes.
Previously, allocator sync files were added. This commit wires allocator sync into the lease queue, replicate queue, and mma store rebalancer. Each node has a single allocator sync instance responsible for coordinating with the corresponding mma allocator. Note that while the allocator_sync is now plumbed, they are not yet actively used.
Previously, allocator sync was plumbed into the replicate queue, lease queue, and mma store rebalancer. This commit updates the lease and replicate queues to call allocator sync to register changes before they are applied, and again post apply to reflect load changes in the local store pool. For now, allocator sync is only used for updating the store pool. Future commits will extend this to coordinate with mma.
Previously, allocator sync was called before and after applying changes, but only to update the store pool. This commit extends it to also register changes with the mma allocator. If mma is enabled at registration time, allocator sync tracks the change and informs mma during PostApply. If mma is disabled, it continues to only update the store pool.
Previously, allocator sync tracked external changes from the replicate and lease queues. This commit extends it to register mma driven changes as well. On PostApply, allocator sync now informs mma of the outcome. Note that it is not being used yet. Future commits will call into it from mma store rebalancer.
Previously, mma PreApply was introduced to register mma changes with allocator sync. This commit integrates it into the mma store rebalancer. To handle failures that occur before a change is even registered (such as when a replica doesn’t exist), this commit also adds, MarkChangesAsFailed, in allocator sync. This function bypasses the trackedChanges map since the change is not meant to be applied. It must only be called on failure, and allocator sync is responsible for notifying MMA of the failure.
This commit adds MakeStoreLoadMsg, moved from allocator_mma_integration.go in mmaprototypehelpers. Eventually, all functions from mmaprototypehelpers will be migrated to pkg/kv/kvserver/mmaintegration as part of productionizing the code. Note that no tests have been added yet - a TODO has been left in place to address this. Added to the spreadsheet as well.
This commit adds InvalidSyncChangeID to allocator sync, used only by asim to indicate that no change is currently in progress. This is needed because asim returns control to the tick loop rather than waiting for changes to complete during simulations. So it needs to track the current change id in progress.
Previously, all helper functions from pkg/kv/kvserver/allocator/mmaprototypehelpers were migrated to pkg/kv/kvserver/mmaintegration. This commit completes the transition by removing all usages of mmaprototypehelpers to pkg/kv/kvserver/mmaintegration.
This commit address minor code review comments.
Previously, the replicate queue could skip updating allocator sync when rlm.AdminTransferLease failed, due to early returns on error. This commit ensures allocator sync is always updated before exiting the function, regardless of success or failure.
This commit integrates the store pool and MMA allocator into allocator sync via an interface to facilitate testing within this package.
This commit improves the String methods for several types in mmaprototype to make datadriven test output more readable. It also exports a MakePendingRangeChangeForTesting helper to facilitate constructing pendingReplicaChanges in future datadriven tests.
This commit adds a datadriven test for the mmaintegration package, verifying the basic functionality of allocator sync. It tests how replica changes are converted to mma replica changes and how the store pool is checked, using mocks for components like the store pool and mma allocator.
This commit adds a datadriven test for MakeStoreLoadMsg, covering a basic store load scenario and one with near-zero cpu and byte size utilization.
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
remove unused utility
|
This is the operations I mentioned on slack: cockroach/pkg/kv/kvserver/mmaintegration/datadriven_mmaintegration_test.go Lines 359 to 370 in db61f25
cockroach/pkg/kv/kvserver/allocator/plan/replicate.go Lines 473 to 488 in 5c2a22f
cockroach/pkg/kv/kvserver/allocator/plan/util.go Lines 55 to 56 in c83c57d
|
Regenerates the datadriven output since I changed the string methods.
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not be properly understanding the question.
IMO, we need to ensure that the tests cover conversion to/from the MMA and SMA formats quite exhaustively, i.e., mmaprototype.PendingRangeChange to/from {changeReplicasOp,leaseTransferOp}. That may mean standalone (non-datadriven) tests for convertReplicaChangeToMMA and lifting out the code in MMAPreApply that converts from mmaprototype.PendingRangeChang to what SMA needs and testing that function separately.
Alternatively, one could just test those conversions using the AllocatorSync datadriven test, if that is as simple as standalone tests. I don't have an opinion either way.
Flushed some comments. I didn't look at everything, since these comments may change the tests.
Reviewed 2 of 2 files at r13, 4 of 10 files at r14, 3 of 4 files at r15, 1 of 1 files at r16.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/kv/kvserver/mmaintegration/datadriven_mmaintegration_test.go line 62 at r14 (raw file):
// mockMMAAllocator implements the mmaAllocator interface for testing. type mockMMAAllocator struct {
I think mockMMAllocator and mockStorePool are doing more than what they need to, which is making this test code more complicated and making it harder at a glance to check that the testdata output is correct. We don't really care about what their internal state is, when testing AllocatorSync. All we care about is that AllocatorSync is calling their interface methods as expected, with the expected parameters. So one could remove all this internal state and just print those calls. As an example of this pattern, see TestProcessorBasic in replica_rac2. Similar to AllocatorSync, the processorImpl there is mostly glue code, and that is what is being tested, while the things it calls are abstracted via test interfaces (testRangeController being the main one)
cockroach/pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor_test.go
Lines 284 to 287 in e36054b
| var sched testRaftScheduler | |
| var piggybacker testAdmittedPiggybacker | |
| var q testACWorkQueue | |
| var rcFactory testRangeControllerFactory |
And as a nit, they are not really mocks (which assert on calls), so they could be renamed with a test prefix instead of mock.
pkg/kv/kvserver/mmaintegration/testdata/mmaintegration/add_nonvoter.txt line 10 at r14 (raw file):
---- sync_change_id:1 mma_change:
I was initially confused by the mma_change phrasing here. Looking at the code, it seems this is just the change, formatted in a way that MMA likes to receive, but may not necessarily be passed to MMA (when the change originates in MMA itself). If my understanding is correct, how about just renaming to the change:.
pkg/kv/kvserver/mmaintegration/testdata/mmaintegration/add_nonvoter.txt line 27 at r14 (raw file):
post-apply id=1 success=true ---- applied change 1 with success=true
I was expecting to see two lines of output, corresponding to the call to storePool.UpdateLocalStoreAfterRebalance and mmaState.AdjustPendingChangesDisposition.
To figure out whether the behavior is correct, one needs to look at the state printed below. Which is definitely a reasonable pattern if the AllocatorSync were glued to real things, say A and B. One would need to print the state of A and B since one can't intercept the AllocatorSync calls to them. But it is glued to test implementations. See more on this below.
pkg/kv/kvserver/mmaintegration/testdata/storeload/almost_zero_util.txt line 7 at r17 (raw file):
# write_bytes_per_second=1024 logical_bytes=2048 capacity=1000 available=200 # lease_count=500 range_count=1000 node_cpu_rate_capacity=2000 # node_cpu_rate_usage=1000 stores_cpu_rate=1500 num_stores=2 timestamp=1234567890
nit: let's not repeat the command below in this comment. It just adds more work for the reader, who is left wondering whether there is some subtle difference between the two.
pkg/kv/kvserver/mmaintegration/testdata/storeload/almost_zero_util.txt line 12 at r17 (raw file):
# - desc.NodeCapacity.NodeCPURateCapacity = 200000000000 # If capacity == available, byte size capacity is Available(200). # - desc.Capacity.Capacity = desc.Capacity.Available
I don't understand desc.Capacity.Capacity = desc.Capacity.Available. Is the LHS supposed to be the capacity we compute? If so, it is StoreLoadMsg.Capacity[ByteSize] and not the descriptor capacity.
pkg/kv/kvserver/mmaintegration/testdata/storeload/almost_zero_util.txt line 13 at r17 (raw file):
# If capacity == available, byte size capacity is Available(200). # - desc.Capacity.Capacity = desc.Capacity.Available make-store-load-msg store_id=1 cpu_per_second=1000 write_bytes_per_second=1024 logical_bytes=2048 capacity=200 available=200 lease_count=500 range_count=1000 node_cpu_rate_capacity=200000000000 node_cpu_rate_usage=1 stores_cpu_rate=1500 num_stores=2 timestamp=1234567890
Don't change this now, since you've already made it a datadriven test, but mostly a test with a single command could be more simply written using a bunch of test cases with a testCase struct containing the input and the expected output.
pkg/kv/kvserver/mmaintegration/testdata/mmaintegration/add_nonvoter_mma.txt line 7 at r14 (raw file):
init mma_enabled=true ---- range_id=1 [replicas:{(r1*:n1,s1,voter),(r2:n2,s2,non-voter)} usage=(cpu:150,raft-cpu:50,write-band:1024,byte-size:2048)]
These tests fiddle with a single range. That means one needs a different test file for each small test case. Also, it doesn't test multiple changes in-flight at the same time. One could construct say 3 ranges, where the usage is scaled up, range_id=2 has 2x the usage, and range_id=3 has 3x the usage, and fiddle with all of them in a single test. Of course, what I am saying is subjective, so up to you how to structure this -- but I do think we need a test with multiple in-flight changes and some of them failing.
|
Unassigning for now, but ready to return to this once your current project is wrapped up @wenyihu6 |
|
I’ll revisit this after the branch cut since it’s test-only unless either of you feel this should be prioritized. Added an item on our spreadsheet and assigned to me. |
Stacked on top of #151369.
Epic: CRDB-25222
Release note: none
mmaprototype: add MakePendingRangeChangeForTesting
This commit improves the String methods for several types in mmaprototype to
make datadriven test output more readable. It also exports a
MakePendingRangeChangeForTesting helper to facilitate constructing
pendingReplicaChanges in future datadriven tests.
mmaintegration: add mmaintegration datadriven tests
This commit adds a datadriven test for the mmaintegration package, verifying
the basic functionality of allocator sync. It tests how replica changes are
converted to mma replica changes and how the store pool is checked, using mocks
for components like the store pool and mma allocator.
mmaintegration: add store load datadriven tests
This commit adds a datadriven test for MakeStoreLoadMsg, covering a basic store
load scenario and one with near-zero cpu and byte size utilization.