Skip to content

Conversation

@wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Aug 29, 2025

Informs: #151775
Release note: none


kvserver: add ReplicateQueueDroppedDueToSize

Previously, we had limited observability into when queues drop replicas due to
reaching their maximum size. This commit adds a metric to track and observe such
events.


kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize

Previously, the max base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs in the priority queue are small, we don’t
see a strong reason for this fixed limit. As an incremental and backport
friendly change, this commit makes the queue size configurable via a cluster
setting. The replicate queue uses its own setting, allowing its size to be
increased independently while leaving other queues unchanged.


kvserver: add TestBaseQueueMaxSize

This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for configuring the base
queue’s max size works correctly.


Revert "kvserver: add TestBaseQueueMaxSize"

This reverts commit a5f01e5.


Revert "kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize"

This reverts commit d89eaa7.


kvserver: add ReplicateQueueMaxSize

Previously, the maximum base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs are small, there’s little reason to enforce
a fixed limit. This commit makes the replicate queue size configurable via a
cluster setting ReplicateQueueMaxSize, allowing incremental and backport-friendly
adjustments. Note that reducing the setting does not drop replicas appropirately;
future commits will address this behavior.


kvserver: add TestReplicateQueueMaxSize

This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for ReplicateQueueMaxSize
works correctly.


kvserver: drop excess replicas when lowering ReplicateQueueMaxSize

Previously, the ReplicateQueueMaxSize cluster setting allowed dynamic adjustment
of the replicate queue’s maximum size. However, decreasing this setting did not
properly drop excess replicas. This commit fixes that by removing replicas when
the queue’s max size is lowered.


kvserver: rename ReplicateQueueDroppedDueToSize to ReplicateQueueFull

This commit improves the clarity around the naming and description of the
metrics.

@blathers-crl
Copy link

blathers-crl bot commented Aug 29, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@wenyihu6 wenyihu6 force-pushed the maxsize branch 8 times, most recently from f6a4ef4 to a5f01e5 Compare August 29, 2025 03:14
@wenyihu6 wenyihu6 marked this pull request as ready for review August 29, 2025 03:18
@wenyihu6 wenyihu6 requested a review from a team as a code owner August 29, 2025 03:18
@wenyihu6 wenyihu6 requested review from sumeerbhola and tbg August 29, 2025 03:18
@wenyihu6 wenyihu6 changed the title kvserver: make maxsize a cluster setting kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize Aug 29, 2025
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coarse review. Looks good generally!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/kvserverbase/base.go line 127 at r3 (raw file):

// except for the replicate queue. Base queue starts dropping replicas when
// exceeding the max size. Use kv.replicate_queue.max_size for that.
var BaseQueueMaxSize = settings.RegisterIntSetting(

I think it's confusing to have a base queue and specific queue max size setting due to questions on how they interact. I'd remove this setting and keep just the one for the replicate queue. I would also make the replicate queue setting slated for deprecation:

  • merge it now, with 10k default
  • on master, bump the default to ~infinity (some very large number)
  • backport only the 10k default
  • in a future version, after we've run with ~infinity most of the time and never heard about issues, remove the setting.

pkg/kv/kvserver/lease_queue.go line 89 at r3 (raw file):

	lq.baseQueue = newBaseQueue("lease", lq, store,
		queueConfig{
			maxSize:              kvserverbase.BaseQueueMaxSize,

this rename causes quite a bit of churn, which I'd maybe avoid in the interest of backporting.


pkg/kv/kvserver/metrics.go line 2196 at r3 (raw file):

	}
	metaReplicateQueueDroppedDueToSize = metric.Metadata{
		Name:        "queue.replicate.dropped_due_to_size",

this could be misunderstood as "replicas dropped due to replica size", which is not a think for the replicate queue. Maybe queue.replicate.queue_full ("Number of times a replica was dropped from the queue due to queue fullness")


pkg/kv/kvserver/queue.go line 791 at r3 (raw file):

	// scan.
	if pqLen := bq.mu.priorityQ.Len(); int64(pqLen) > bq.mu.maxSize {
		replicaItemToDrop := bq.mu.priorityQ.sl[pqLen-1]

when the cluster setting gets dropped, the queue might be overfull by more than one, so here you may need to drop more items.

Copy link
Contributor Author

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion to use only one cluster setting - this makes the backport much cleaner. I reverted the last 2 commit you reviewed and started fresh, as the new changes are much smaller. Opened this new PR #152763 if you prefer a new PR with those two commits away from the current history entirely.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/kv/kvserver/metrics.go line 2196 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

this could be misunderstood as "replicas dropped due to replica size", which is not a think for the replicate queue. Maybe queue.replicate.queue_full ("Number of times a replica was dropped from the queue due to queue fullness")

Renamed.


pkg/kv/kvserver/queue.go line 791 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

when the cluster setting gets dropped, the queue might be overfull by more than one, so here you may need to drop more items.

Good catch - fixed in a follow up commit.


pkg/kv/kvserver/lease_queue.go line 89 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

this rename causes quite a bit of churn, which I'd maybe avoid in the interest of backporting.

Good call, changed it so that only the replicate queue has *Settings.IntSetting.


pkg/kv/kvserver/kvserverbase/base.go line 127 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think it's confusing to have a base queue and specific queue max size setting due to questions on how they interact. I'd remove this setting and keep just the one for the replicate queue. I would also make the replicate queue setting slated for deprecation:

  • merge it now, with 10k default
  • on master, bump the default to ~infinity (some very large number)
  • backport only the 10k default
  • in a future version, after we've run with ~infinity most of the time and never heard about issues, remove the setting.

Removed the base queue cluster setting.

@wenyihu6 wenyihu6 requested a review from tbg August 30, 2025 00:33
@wenyihu6 wenyihu6 changed the title kvserver: add BaseQueueMaxSize and ReplicateQueueMaxSize kvserver: add ReplicateQueueMaxSize Aug 31, 2025
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@tbg reviewed 4 of 4 files at r1, 14 of 15 files at r2, 1 of 1 files at r3, 15 of 15 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 2 of 2 files at r8, 5 of 5 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replicate_queue.go line 112 at r6 (raw file):

		"when this limit is exceeded, lower priority (not guaranteed to be the lowest) "+
		"replicas are dropped from the queue",
	defaultQueueMaxSize,

Here, too, do you want to make a reminder to bump this to math.MaxInt64 (or something like that) on master soon after this merges? The goal is that we "see that this is fine" and that we feel free to dispense with the configurability again in the future.


pkg/kv/kvserver/store_rebalancer_test.go line 1871 at r7 (raw file):

// verifies that replicas are dropped when the max size is exceeded. It also
// checks that the metric ReplicateQueueDroppedDueToSize is updated correctly.
func TestReplicateQueueMaxSize(t *testing.T) {

Thank you for writing this test, I know it's a drag.

Previously, we had limited observability into when queues drop replicas due to
reaching their maximum size. This commit adds a metric to track and observe such
events.
Previously, the max base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs in the priority queue are small, we don’t
see a strong reason for this fixed limit. As an incremental and backport
friendly change, this commit makes the queue size configurable via a cluster
setting. The replicate queue uses its own setting, allowing its size to be
increased independently while leaving other queues unchanged.
This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for configuring the base
queue’s max size works correctly.
Previously, the maximum base queue size was hardcoded to defaultQueueMaxSize
(10000). Since replica item structs are small, there’s little reason to enforce
a fixed limit. This commit makes the replicate queue size configurable via a
cluster setting ReplicateQueueMaxSize, allowing incremental and backport-friendly adjustments. Note that reducing the setting does not drop replicas appropirately;
future commits will address this behavior.
This commit adds tests to (1) verify metric updates when replicas are dropped
from the queue, and (2) ensure the cluster setting for ReplicateQueueMaxSize
works correctly.
Previously, the ReplicateQueueMaxSize cluster setting allowed dynamic adjustment
of the replicate queue’s maximum size. However, decreasing this setting did not
properly drop excess replicas. This commit fixes that by removing replicas when
the queue’s max size is lowered.
This commit improves the clarity around the naming and description of the
metrics.
Copy link
Contributor Author

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola and @tbg)


pkg/kv/kvserver/replicate_queue.go line 112 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Here, too, do you want to make a reminder to bump this to math.MaxInt64 (or something like that) on master soon after this merges? The goal is that we "see that this is fine" and that we feel free to dispense with the configurability again in the future.

Put up #152825.

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Sep 1, 2025

TFTR!

bors r=tbg

@cockroachdb cockroachdb deleted a comment from craig bot Sep 1, 2025
@craig
Copy link
Contributor

craig bot commented Sep 1, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants