-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Is your feature request related to a problem? Please describe.
Two settings control blocking replica rebalancing, and lease transfers to stores depending on their IO overload score:
rebalancing:: kv.allocator.replica_io_overload_threshold
lease transfers:: kv.allocator.lease_io_overload_threshold
These settings both take into account the mean, as well as the threshold define in the setting, requiring a store be above the 110% of the mean and the threshold:
cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go
Lines 2387 to 2391 in c8df48c
| func ioOverloadCheck( | |
| score, avg, absThreshold, meanThreshold float64, | |
| enforcement IOOverloadEnforcementLevel, | |
| disallowed ...IOOverloadEnforcementLevel, | |
| ) (ok bool, reason string) { |
We have seen instances where too many recovery + rebalancing snapshots can itself cause IO overload to occur, as disk resources are saturated.
Describe the solution you'd like
Re-evaluate the default values for kv.allocator.replica_io_overload_threshold (0.8) 1 and kv.allocator.lease_io_overload_threshold (0.5) 2, considering lower values.
Additional context
See scale testing for 23.2 (internal) slack.
Jira issue: CRDB-32448
Footnotes
-
https://github.com/cockroachdb/cockroach/blob/c8df48cb85f40c10940dfbc3efa6281dfe5c9701/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L74-L77 ↩
-
https://github.com/cockroachdb/cockroach/blob/c8df48cb85f40c10940dfbc3efa6281dfe5c9701/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L82-L82 ↩