kvserver: add ReplicateQueueMaxSize #152699

wenyihu6 · 2025-08-29T01:14:25Z

Informs: #151775
Release note: none

kvserver: add ReplicateQueueDroppedDueToSize

Previously, we had limited observability into when queues drop replicas due to
reaching their maximum size. This commit adds a metric to track and observe such
events.

kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize

Previously, the max base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs in the priority queue are small, we don’t
see a strong reason for this fixed limit. As an incremental and backport
friendly change, this commit makes the queue size configurable via a cluster
setting. The replicate queue uses its own setting, allowing its size to be
increased independently while leaving other queues unchanged.

kvserver: add TestBaseQueueMaxSize

This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for configuring the base
queue’s max size works correctly.

Revert "kvserver: add TestBaseQueueMaxSize"

This reverts commit a5f01e5.

Revert "kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize"

This reverts commit d89eaa7.

kvserver: add ReplicateQueueMaxSize

Previously, the maximum base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs are small, there’s little reason to enforce
a fixed limit. This commit makes the replicate queue size configurable via a
cluster setting ReplicateQueueMaxSize, allowing incremental and backport-friendly
adjustments. Note that reducing the setting does not drop replicas appropirately;
future commits will address this behavior.

kvserver: add TestReplicateQueueMaxSize

This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for ReplicateQueueMaxSize
works correctly.

kvserver: drop excess replicas when lowering ReplicateQueueMaxSize

Previously, the ReplicateQueueMaxSize cluster setting allowed dynamic adjustment
of the replicate queue’s maximum size. However, decreasing this setting did not
properly drop excess replicas. This commit fixes that by removing replicas when
the queue’s max size is lowered.

kvserver: rename ReplicateQueueDroppedDueToSize to ReplicateQueueFull

This commit improves the clarity around the naming and description of the
metrics.

blathers-crl · 2025-08-29T01:14:29Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-08-29T01:14:36Z

This change is

tbg

Coarse review. Looks good generally!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/kv/kvserver/kvserverbase/base.go line 127 at r3 (raw file):

// except for the replicate queue. Base queue starts dropping replicas when
// exceeding the max size. Use kv.replicate_queue.max_size for that.
var BaseQueueMaxSize = settings.RegisterIntSetting(

I think it's confusing to have a base queue and specific queue max size setting due to questions on how they interact. I'd remove this setting and keep just the one for the replicate queue. I would also make the replicate queue setting slated for deprecation:

merge it now, with 10k default
on master, bump the default to ~infinity (some very large number)
backport only the 10k default
in a future version, after we've run with ~infinity most of the time and never heard about issues, remove the setting.

pkg/kv/kvserver/lease_queue.go line 89 at r3 (raw file):

	lq.baseQueue = newBaseQueue("lease", lq, store,
		queueConfig{
			maxSize:              kvserverbase.BaseQueueMaxSize,

this rename causes quite a bit of churn, which I'd maybe avoid in the interest of backporting.

pkg/kv/kvserver/metrics.go line 2196 at r3 (raw file):

	}
	metaReplicateQueueDroppedDueToSize = metric.Metadata{
		Name:        "queue.replicate.dropped_due_to_size",

this could be misunderstood as "replicas dropped due to replica size", which is not a think for the replicate queue. Maybe queue.replicate.queue_full ("Number of times a replica was dropped from the queue due to queue fullness")

pkg/kv/kvserver/queue.go line 791 at r3 (raw file):

	// scan.
	if pqLen := bq.mu.priorityQ.Len(); int64(pqLen) > bq.mu.maxSize {
		replicaItemToDrop := bq.mu.priorityQ.sl[pqLen-1]

when the cluster setting gets dropped, the queue might be overfull by more than one, so here you may need to drop more items.

wenyihu6

Thanks for the suggestion to use only one cluster setting - this makes the backport much cleaner. I reverted the last 2 commit you reviewed and started fresh, as the new changes are much smaller. Opened this new PR #152763 if you prefer a new PR with those two commits away from the current history entirely.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)

pkg/kv/kvserver/metrics.go line 2196 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

this could be misunderstood as "replicas dropped due to replica size", which is not a think for the replicate queue. Maybe queue.replicate.queue_full ("Number of times a replica was dropped from the queue due to queue fullness")

Renamed.

pkg/kv/kvserver/queue.go line 791 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

when the cluster setting gets dropped, the queue might be overfull by more than one, so here you may need to drop more items.

Good catch - fixed in a follow up commit.

pkg/kv/kvserver/lease_queue.go line 89 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

this rename causes quite a bit of churn, which I'd maybe avoid in the interest of backporting.

Good call, changed it so that only the replicate queue has *Settings.IntSetting.

pkg/kv/kvserver/kvserverbase/base.go line 127 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think it's confusing to have a base queue and specific queue max size setting due to questions on how they interact. I'd remove this setting and keep just the one for the replicate queue. I would also make the replicate queue setting slated for deprecation:

merge it now, with 10k default

on master, bump the default to ~infinity (some very large number)

backport only the 10k default

in a future version, after we've run with ~infinity most of the time and never heard about issues, remove the setting.

Removed the base queue cluster setting.

tbg

@tbg reviewed 4 of 4 files at r1, 14 of 15 files at r2, 1 of 1 files at r3, 15 of 15 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 2 of 2 files at r8, 5 of 5 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/kv/kvserver/replicate_queue.go line 112 at r6 (raw file):

		"when this limit is exceeded, lower priority (not guaranteed to be the lowest) "+
		"replicas are dropped from the queue",
	defaultQueueMaxSize,

Here, too, do you want to make a reminder to bump this to math.MaxInt64 (or something like that) on master soon after this merges? The goal is that we "see that this is fine" and that we feel free to dispense with the configurability again in the future.

pkg/kv/kvserver/store_rebalancer_test.go line 1871 at r7 (raw file):

// verifies that replicas are dropped when the max size is exceeded. It also
// checks that the metric ReplicateQueueDroppedDueToSize is updated correctly.
func TestReplicateQueueMaxSize(t *testing.T) {

Thank you for writing this test, I know it's a drag.

Previously, we had limited observability into when queues drop replicas due to reaching their maximum size. This commit adds a metric to track and observe such events.

Previously, the max base queue size was hardcoded to defaultQueueMaxSize (10000). Since replica item structs in the priority queue are small, we don’t see a strong reason for this fixed limit. As an incremental and backport friendly change, this commit makes the queue size configurable via a cluster setting. The replicate queue uses its own setting, allowing its size to be increased independently while leaving other queues unchanged.

This commit adds tests to (1) verify metric updates when replicas are dropped from the queue, and (2) ensure the cluster setting for configuring the base queue’s max size works correctly.

This reverts commit a5f01e5.

This reverts commit d89eaa7.

Previously, the maximum base queue size was hardcoded to defaultQueueMaxSize (10000). Since replica item structs are small, there’s little reason to enforce a fixed limit. This commit makes the replicate queue size configurable via a cluster setting ReplicateQueueMaxSize, allowing incremental and backport-friendly adjustments. Note that reducing the setting does not drop replicas appropirately; future commits will address this behavior.

This commit adds tests to (1) verify metric updates when replicas are dropped from the queue, and (2) ensure the cluster setting for ReplicateQueueMaxSize works correctly.

Previously, the ReplicateQueueMaxSize cluster setting allowed dynamic adjustment of the replicate queue’s maximum size. However, decreasing this setting did not properly drop excess replicas. This commit fixes that by removing replicas when the queue’s max size is lowered.

This commit improves the clarity around the naming and description of the metrics.

wenyihu6

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola and @tbg)

pkg/kv/kvserver/replicate_queue.go line 112 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Here, too, do you want to make a reminder to bump this to math.MaxInt64 (or something like that) on master soon after this merges? The goal is that we "see that this is fine" and that we feel free to dispense with the configurability again in the future.

Put up #152825.

wenyihu6 · 2025-09-01T16:59:25Z

TFTR!

bors r=tbg

craig · 2025-09-01T17:36:06Z

Build succeeded:

wenyihu6 force-pushed the maxsize branch 8 times, most recently from f6a4ef4 to a5f01e5 Compare August 29, 2025 03:14

wenyihu6 marked this pull request as ready for review August 29, 2025 03:18

wenyihu6 requested a review from a team as a code owner August 29, 2025 03:18

wenyihu6 requested review from sumeerbhola and tbg August 29, 2025 03:18

wenyihu6 changed the title ~~kvserver: make maxsize a cluster setting~~ kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize Aug 29, 2025

tbg reviewed Aug 29, 2025

View reviewed changes

wenyihu6 commented Aug 29, 2025

View reviewed changes

wenyihu6 requested a review from tbg August 30, 2025 00:33

wenyihu6 changed the title ~~kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize~~ kvserver: add ReplicateQueueMaxSize Aug 31, 2025

tbg approved these changes Sep 1, 2025

View reviewed changes

wenyihu6 added 9 commits September 1, 2025 11:12

kvserver: add ReplicateQueueDroppedDueToSize

ee4926a

Previously, we had limited observability into when queues drop replicas due to reaching their maximum size. This commit adds a metric to track and observe such events.

kvserver: add TestBaseQueueMaxSize

226da85

This commit adds tests to (1) verify metric updates when replicas are dropped from the queue, and (2) ensure the cluster setting for configuring the base queue’s max size works correctly.

Revert "kvserver: add TestBaseQueueMaxSize"

1af9d80

This reverts commit a5f01e5.

Revert "kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize"

7fa513a

This reverts commit d89eaa7.

kvserver: add TestReplicateQueueMaxSize

2c8a6da

This commit adds tests to (1) verify metric updates when replicas are dropped from the queue, and (2) ensure the cluster setting for ReplicateQueueMaxSize works correctly.

kvserver: rename ReplicateQueueDroppedDueToSize to ReplicateQueueFull

ac802fe

This commit improves the clarity around the naming and description of the metrics.

wenyihu6 force-pushed the maxsize branch from 0c3fcf5 to ac802fe Compare September 1, 2025 15:12

wenyihu6 mentioned this pull request Sep 1, 2025

kvserver: default ReplicateQueueMaxSize to math.MaxInt64 #152825

Merged

wenyihu6 commented Sep 1, 2025

View reviewed changes

cockroachdb deleted a comment from craig bot Sep 1, 2025

craig bot merged commit 29896e5 into cockroachdb:master Sep 1, 2025
23 checks passed

celeste-cockroachdb bot added the target-release-25.4.0 label Sep 1, 2025

This was referenced Sep 5, 2025

release-25.2.6-rc: kvserver: requeue on priority inversion for replicate queue #153052

Merged

release-24.3.20-rc: kvserver: requeue on priority inversion for replicate queue #153062

Merged

celeste-cockroachdb bot added v25.4.0-prerelease and removed target-release-25.4.0 labels Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kvserver: add ReplicateQueueMaxSize #152699

kvserver: add ReplicateQueueMaxSize #152699

Uh oh!

wenyihu6 commented Aug 29, 2025 •

edited

Loading

Uh oh!

blathers-crl bot commented Aug 29, 2025

Uh oh!

cockroach-teamcity commented Aug 29, 2025

Uh oh!

tbg left a comment

Uh oh!

wenyihu6 left a comment

Uh oh!

tbg left a comment

Uh oh!

wenyihu6 left a comment

Uh oh!

wenyihu6 commented Sep 1, 2025

Uh oh!

craig bot commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kvserver: add ReplicateQueueMaxSize #152699

kvserver: add ReplicateQueueMaxSize #152699

Uh oh!

Conversation

wenyihu6 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Aug 29, 2025

Uh oh!

cockroach-teamcity commented Aug 29, 2025

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 commented Sep 1, 2025

Uh oh!

craig bot commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenyihu6 commented Aug 29, 2025 •

edited

Loading