Skip to content

Conversation

@pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Nov 15, 2025

This commit makes TestCluster by default assign a unique cluster name in the TestServerArgs. The cluster name is shared by all participating or added nodes, unless overridden in the per-node args.

This helps preventing accidental message exchange between TestClusters in the same environment that use the same TCP port in close proximity from each other.

Addresses #157838

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the generate-cluster-name branch 6 times, most recently from cda91e1 to 8e50d43 Compare November 15, 2025 13:34
@pav-kv pav-kv marked this pull request as ready for review November 15, 2025 13:34
@pav-kv pav-kv requested review from a team as code owners November 15, 2025 13:34
@pav-kv pav-kv requested review from Abhinav1299, aa-joshi, alyshanjahani-crl, arjunmahishi and tbg and removed request for a team November 15, 2025 13:34
@pav-kv pav-kv force-pushed the generate-cluster-name branch from 8e50d43 to 148d2f1 Compare November 15, 2025 13:48
@pav-kv pav-kv force-pushed the generate-cluster-name branch from 148d2f1 to 4595bb4 Compare November 17, 2025 17:55
@pav-kv
Copy link
Collaborator Author

pav-kv commented Nov 17, 2025

Fixed all the pkg/cli tests, but TestPartialZip/partial2 still resists. It decommissions node 2 and expects this node to not pop up in the debug.zip, but for some reason it still does. So far I don't understand how the change in this PR makes it behave differently.

Here, this node is visible as NODE_STATUS_DEAD, and I take it the test waits until it becomes NODE_STATUS_DECOMMISSIONED and doesn't pop up in the debug.zip.

if liveness == livenesspb.NodeLivenessStatus_DECOMMISSIONED {

Which must eventually happen because of the decommission command:

// Now mark the stopped node as decommissioned, and check that zip
// skips over it automatically. We specifically use --wait=none because
// we're decommissioning a node in a 3-node cluster, so there's no node to
// up-replicate the under-replicated ranges to.
{
_, err := c.RunWithCapture(fmt.Sprintf("node decommission --checks=skip --wait=none %d", 2))
if err != nil {
t.Fatal(err)
}
}

@pav-kv
Copy link
Collaborator Author

pav-kv commented Nov 17, 2025

Ah, probably need to set --cluster-name on the decommission command too.
Yes, fixed.

This commit makes TestCluster by default assign unique cluster name in
the TestServerArgs. The cluster name is shared by all participating or
added nodes, unless overridden.

Epic: none
Release note: none
@pav-kv pav-kv force-pushed the generate-cluster-name branch from 4595bb4 to 1757367 Compare November 17, 2025 19:19
@pav-kv pav-kv added the backport-all Flags PRs that need to be backported to all supported release branches label Nov 17, 2025
@pav-kv
Copy link
Collaborator Author

pav-kv commented Nov 17, 2025

TFTRs!

bors r=RaduBerinde,stevendanna

@craig
Copy link
Contributor

craig bot commented Nov 17, 2025

@craig craig bot merged commit e837db7 into cockroachdb:master Nov 17, 2025
29 of 31 checks passed
@blathers-crl
Copy link

blathers-crl bot commented Nov 17, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 1757367 to blathers/backport-release-24.3-157868: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-24.3 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 1757367 to blathers/backport-release-25.2-157868: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-25.2 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pav-kv pav-kv deleted the generate-cluster-name branch November 17, 2025 21:32
if clusterArgs.ServerArgs.ClusterName == "" {
// Use a cluster name that is sufficiently unique (within the CI env) but is
// concise and recognizable.
clusterArgs.ServerArgs.ClusterName = fmt.Sprintf("TestCluster-%d", rand.Uint32())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is recent evidence to suggest rand.Uint32 isn't sufficiently unique; collision probability is high after thousands of iterations [1].

[1] #157919 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll fix up.

craig bot pushed a commit that referenced this pull request Nov 18, 2025
157995: testcluster: reduce ClusterName collisions r=stevendanna,srosenberg a=pav-kv

As suggested in #157868 (comment), `rand.Uint32()` has a pretty high chance of collision.
This PR increases the space from `2^32` to `62^10`.

Addresses #157838

Co-authored-by: Pavel Kalinnikov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-all Flags PRs that need to be backported to all supported release branches backport-failed target-release-26.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants