server: consider removing the decommission nudger

The recently-added decommission nudger (https://github.com/cockroachdb/cockroach/pull/130117) is responsible for periodically enqueuing ranges with decommissioning replicas to the replicate queue. However, we have seen in recent escalations that, in cases of a lot of pending replicate queue actions, such replicas still block decommissioning; the replicas stop doing so only after being manually (via the DB console) enqueued to the replicate queue.

We have an issue (https://github.com/cockroachdb/cockroach/issues/148090) to have better visibility into whether the nudger is doing its job, but assuming it is, the way it enqueues replicas still differs from the manual enqueueing:

- The nudger [enqueues the replica](https://github.com/cockroachdb/cockroach/blob/f17cf577363c6a128ce5bf922da5691483ff61e6/pkg/kv/kvserver/replica.go#L2937) via [AddAsync](https://github.com/cockroachdb/cockroach/blob/32f7bad014fe3d72f54ab9c2296d34ff47cfbecc/pkg/kv/kvserver/queue.go#L633), with a mid-level priority corresponding to `AllocatorReplaceDecommissioningVoter`. Then the replica waits for its turn to be processed.
- The manual enqueuing (with or without `skipShouldQueue`) [calls](https://github.com/cockroachdb/cockroach/blob/7ac157f074482a1523f18eaa097bec7132bc69de/pkg/server/admin.go#L3257) `store.Enqueue` with `async=false`, and the replica is [processed directly](https://github.com/cockroachdb/cockroach/blob/a2e2f92c764f2bb5b5cf556b266df753bdb06866/pkg/kv/kvserver/store.go#L4097).

We should consider either changing the priority associated with `AllocatorReplaceDecommissioningVoter`, or allowing the nudger to process replicas (more) directly.

Companion issue: https://github.com/cockroachdb/cockroach/issues/148090.


Jira issue: CRDB-52839

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: consider removing the decommission nudger #150667

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: consider removing the decommission nudger #150667

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions