-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry sending signals to pid_file_name before logging errors #252
base: main
Are you sure you want to change the base?
Conversation
4f81e68
to
630fb75
Compare
Added some test cover for command exec. Will add some for pid signalling next. |
7d742a6
to
b3fda05
Compare
ea321a3
to
8eb8181
Compare
I see some linter complaints here. I'll address them, and probably split the PR. |
Fix compilation on GOOS=windows for the main binary and tests, both of which were broken. Fix a leaked open file handle which caused config_test.go to fail on windows with "Sharing violation" errors when cleaning up the tempdir. Add a 'test-wine' Makefile target. Explain how to run the tests for windows via wine in the readme. Signed-off-by: Craig Ringer <[email protected]>
Add test cover for: * long-running and short-lived cmd and cmd_args invocations * signalling via pid_file_name and renew_signal * windows cmd and cmd_args execution If rebased onto the 'fix-windows' branch for windows test fixes, these pass for windows test runs too. To make these tests possible the following changes are made to the sidecar code: * Sidecar.certReadyChan yields *workloadapi.X509Context not struct{} so the context can be observed. This could also be useful for code extending spiffe-helper. * A new Sidecar.cmdExitChan channel reports whenever a process managed by spiffe-helper exits, and the exit state. This allows tests to wait for process exits and see the exit code, etc. * A new Sidecar.pidFileSignalledChan channel reports whenever signalling the file in pid_file_name is attempted, the pid signalled, and the outcome. This allows signal based tests to see what spiffe-helper tried to signal. Channels with values are used to ensure that there's no race between what might be present in the Sidecar struct state when the test observes it and what was there when the event of interest occurred. It might be handy to have a channel for observing when the managed process is started too, but the tests just busy-wait polling Sidecar.processRunning to detect this for now. I considered combining these channels into some kind of unified observer channel that emits a union struct depending on the event type, but it seemed pointless to do this when we can just select{} on multiple channels of interest and keep each channel simple. Signed-off-by: Craig Ringer <[email protected]>
… into retry-signal-pid-file These are tracked by PRs * spiffe#258 for test suite * spiffe#257 for windows Once they are merged the upcoming commit can be merged in its own PR
92a60c4
to
cb78cab
Compare
Windows fixes split off to simpler individual PR: #257 Added test cover split off to a PR that focuses on adding the test cover, #258 I've rebased this PR onto a merge-commit created from the two listed above and fixed the linter complaints. Github doesn't seem to have any way to say "this PR depends on these other two", and only appears to understand linear histories, so I'll keep this PR marked as a draft to prevent it being inadvertently squash-merged until the other two are reviewed and merged. Then this will be a trivial The only commit unique to this PR is cb78cab so that's the only one to focus on for review. |
There is a race where the process pointed to by pid_file_name could be restarted and read the old certificates but not have its new pid written into pid_file_name yet. In this case, we would try to signal the old process and fail, then give up, so the new process would never get its certs refreshed. On initial startup, unavoidable errors may also be logged when the process to be launched has not yet been started (because its certs don't exist yet) so pid_file_name is missing or invalid. These shouldn't be logged as errors, but we don't really want to suppress all errors for all pid file signalling since that would hide misconfigurations. Handle both issues by adding a retry-backoff loop for signalling pid_file_name. On failure, messages are logged at Info level until the retry limit is exceeded, then an Error is logged. During startup the controlling process is expected to be watching for the creation of the certificates, and to launch the controlled process and create the pid file in a timely manner once the certs appear. So it becomes possible to start up without "normal" errors appearing. For the restart-race case, we will initially fail to signal the stale pid in the pid file, so we will retry until the new pid is written, so the new process that loaded stale certs will be properly signalled to reload. To enable better testing for this, the retry count is exposed in the test channel for process signalling. Fixes spiffe#250, spiffe#251 Signed-off-by: Craig Ringer <[email protected]>
cb78cab
to
f3c7cec
Compare
There is a race where the process pointed to by pid_file_name could be restarted and read the old certificates but not have its new pid written into pid_file_name yet. In this case, we would try to signal the old process and fail, then give up, so the new process would never get its certs refreshed.
On initial startup, unavoidable errors may also be logged when the process to be launched has not yet been started (because its certs don't exist yet) so pid_file_name is missing or invalid. These shouldn't be logged as errors, but we don't really want to suppress all errors for all pid file signalling since that would hide misconfigurations.
Handle both issues by adding a retry-backoff loop for signalling pid_file_name. On failure, messages are logged at Info level until the retry limit is exceeded, then an Error is logged.
During startup the controlling process is expected to be watching for the creation of the certificates, and to launch the controlled process and create the pid file in a timely manner once the certs appear. So it becomes possible to start up without "normal" errors appearing.
For the restart-race case, we will initially fail to signal the stale pid in the pid file, so we will retry until the new pid is written, so the new process that loaded stale certs will be properly signalled to reload.
Fixes: #250, #251
I added an extensive test suite as part of this, covering command execution, signal handling, etc. If you want some of that split out from this PR then this PR rebased on top I can probably do that.
I also fixed Windows builds and ensured the existing and added tests run on Windows - or at least, under wine when cross-compiled
I strongly suggest merging this via fast-forward merge or with a merge commit, not a squash merge. Otherwise the separation of commits will be lost and it'll be a pain to understand in the history.