Skip to content

allocator: evaluate lower IO overload thresholds for blocking rebalancing / lease transfers #112497

@kvoli

Description

@kvoli

Is your feature request related to a problem? Please describe.
Two settings control blocking replica rebalancing, and lease transfers to stores depending on their IO overload score:

rebalancing:: kv.allocator.replica_io_overload_threshold
lease transfers:: kv.allocator.lease_io_overload_threshold

These settings both take into account the mean, as well as the threshold define in the setting, requiring a store be above the 110% of the mean and the threshold:

func ioOverloadCheck(
score, avg, absThreshold, meanThreshold float64,
enforcement IOOverloadEnforcementLevel,
disallowed ...IOOverloadEnforcementLevel,
) (ok bool, reason string) {

We have seen instances where too many recovery + rebalancing snapshots can itself cause IO overload to occur, as disk resources are saturated.

Describe the solution you'd like
Re-evaluate the default values for kv.allocator.replica_io_overload_threshold (0.8) 1 and kv.allocator.lease_io_overload_threshold (0.5) 2, considering lower values.

Additional context
See scale testing for 23.2 (internal) slack.

Jira issue: CRDB-32448

Footnotes

  1. https://github.com/cockroachdb/cockroach/blob/c8df48cb85f40c10940dfbc3efa6281dfe5c9701/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L74-L77

  2. https://github.com/cockroachdb/cockroach/blob/c8df48cb85f40c10940dfbc3efa6281dfe5c9701/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L82-L82

Metadata

Metadata

Assignees

Labels

A-kv-distributionRelating to rebalancing and leasing.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-quick-winLikely to be a quick win for someone experienced.GA-blockerO-23.2-scale-testingissues found during 23.2 scale testingO-testclusterIssues found or occurred on a test cluster, i.e. a long-running internal clusterT-kvKV Teambranch-release-23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions