-
Notifications
You must be signed in to change notification settings - Fork 4k
security: handle transient files in certificate directory loading #157232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The 'TestDemoLocality' was failing with "no certificates found; does certs dir exist?" errors. This resulted in connection failures when nodes attempted to establish RPC connections. Root cause: The demo cluster stores both TLS certificates and Unix socket files (e.g., .s.PGSQL.26267) in the same directory. When loading certificates, readDir() lists all directory entries and then calls entry.Info() to stat each file. Between these operations, transient socket lock files (e.g., .s.PGSQL.26267.lock.887590299) can be deleted, causing lstat() to fail with ENOENT. This caused the entire certificate loading to fail, even though the actual certificate files existed and were valid. Fix: this change modified the readDir() to skip files that disappear between directory listing and stat operations (a standard pattern for handling concurrent file-system modifications). Fixes cockroachdb#155255 Epic: none Release note: None
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
For more context: it's a regression from #155196 Maybe we should put this pattern into a utility function and use it for other places where we list files (in above PR). |
rafiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great find! thanks for the fix
|
Thanks for chasing down the bug I left you, @shubhamdhama! |
cthumuluru-crdb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find!
|
TFTRs! bors r=rafiss,cthumuluru-crdb |
Multiple packages had duplicated the pattern of calling os.ReadDir followed by entry.Info() on each entry. In cockroachdb#157232, we fixed this logic for security where files may disappear between listing and stat operations. This fix can be extended to other places. For this reason we are moving this pattern to a shared utility. Fixes: none Epic: none Release note: none
Multiple packages had duplicated the pattern of calling os.ReadDir followed by entry.Info() on each entry. In cockroachdb#157232, we fixed this logic for security where files may disappear between listing and stat operations. This fix can be extended to other places. For this reason we are moving this pattern to a shared utility. Fixes: none Epic: none Release note: none
The 'TestDemoLocality' was failing with "no certificates found; does certs dir exist?" errors. This resulted in connection failures when nodes attempted to establish RPC connections.
Root cause: The demo cluster stores both TLS certificates and Unix socket files (e.g.,
.s.PGSQL.26267) in the same directory. When loading certificates,readDir()lists all directory entries and then callsentry.Info()to stat each file. Between these operations, transient socket lock files (e.g.,.s.PGSQL.26267.lock.887590299) can be deleted, causinglstat()to fail with ENOENT. This caused the entire certificate loading to fail, even though the actual certificate files existed and were valid.Fix: this change modified the
readDir()to skip files that disappear between directory listing and stat operations (a standard pattern for handling concurrent file-system modifications).Fixes #155255
Epic: none
Release note: None