Skip to content

Conversation

@shubhamdhama
Copy link
Contributor

@shubhamdhama shubhamdhama commented Nov 12, 2025

The 'TestDemoLocality' was failing with "no certificates found; does certs dir exist?" errors. This resulted in connection failures when nodes attempted to establish RPC connections.

Root cause: The demo cluster stores both TLS certificates and Unix socket files (e.g., .s.PGSQL.26267) in the same directory. When loading certificates, readDir() lists all directory entries and then calls entry.Info() to stat each file. Between these operations, transient socket lock files (e.g., .s.PGSQL.26267.lock.887590299) can be deleted, causing lstat() to fail with ENOENT. This caused the entire certificate loading to fail, even though the actual certificate files existed and were valid.

Fix: this change modified the readDir() to skip files that disappear between directory listing and stat operations (a standard pattern for handling concurrent file-system modifications).

Fixes #155255
Epic: none
Release note: None

The 'TestDemoLocality' was failing with "no certificates found; does certs
dir exist?" errors. This resulted in connection failures when nodes
attempted to establish RPC connections.

Root cause: The demo cluster stores both TLS certificates and Unix socket
files (e.g., .s.PGSQL.26267) in the same directory. When loading
certificates, readDir() lists all directory entries and then calls
entry.Info() to stat each file. Between these operations, transient socket
lock files (e.g., .s.PGSQL.26267.lock.887590299) can be deleted, causing
lstat() to fail with ENOENT. This caused the entire certificate loading to
fail, even though the actual certificate files existed and were valid.

Fix: this change modified the readDir() to skip files that disappear
between directory listing and stat operations (a standard pattern for
handling concurrent file-system modifications).

Fixes cockroachdb#155255
Epic: none
Release note: None
@shubhamdhama shubhamdhama requested review from a team as code owners November 12, 2025 19:09
@blathers-crl
Copy link

blathers-crl bot commented Nov 12, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@shubhamdhama
Copy link
Contributor Author

shubhamdhama commented Nov 12, 2025

For more context: it's a regression from #155196

Maybe we should put this pattern into a utility function and use it for other places where we list files (in above PR).

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find! thanks for the fix

@nicktrav
Copy link
Collaborator

Thanks for chasing down the bug I left you, @shubhamdhama!

@cthumuluru-crdb cthumuluru-crdb self-requested a review November 13, 2025 03:54
Copy link
Contributor

@cthumuluru-crdb cthumuluru-crdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

@shubhamdhama
Copy link
Contributor Author

TFTRs!

bors r=rafiss,cthumuluru-crdb

@craig
Copy link
Contributor

craig bot commented Nov 13, 2025

@craig craig bot merged commit f6b733e into cockroachdb:master Nov 13, 2025
32 of 34 checks passed
shubhamdhama added a commit to shubhamdhama/cockroach that referenced this pull request Nov 13, 2025
Multiple packages had duplicated the pattern of calling os.ReadDir followed
by entry.Info() on each entry. In cockroachdb#157232, we fixed this logic for security
where files may disappear between listing and stat operations. This fix can
be extended to other places. For this reason we are moving this pattern to
a shared utility.

Fixes: none
Epic: none
Release note: none
shubhamdhama added a commit to shubhamdhama/cockroach that referenced this pull request Nov 13, 2025
Multiple packages had duplicated the pattern of calling os.ReadDir followed
by entry.Info() on each entry. In cockroachdb#157232, we fixed this logic for security
where files may disappear between listing and stat operations. This fix can
be extended to other places. For this reason we are moving this pattern to
a shared utility.

Fixes: none
Epic: none
Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: TestDemoLocality failed [failed connection attempt; no certificates found]

5 participants