Conversation
|
In my test this works, using the same values.yml file: Then journald shows: and the system immediately does a clean reboot. However note that "SIGRTMIN+5" was sent but it was interpreted by host as SIGRTMIN+6. I believe that is related to the different output of Anyway this was running on a v1.23 k8s cluster with the old PSP admission controller disabled. The nodes are Almalinux 8.8. I'm afraid I could not reproduce the issue; it works fine for me. Looking at the capabilities of the kured process it also appears fine: Also when I touched the sentinel file on the host, kured worked as expected and the node rebooted: I confirmed with |
|
Thanks for your tests and the positive feedback! Yes, the specific signal value may indeed be a problem. We would need to add a good guidance for the user in our docs, to determine which signal value is correct for their host OS. Most of the users won't know as this is expert knowlege 😆 |
|
I made it work on my cluster with |
38dfbb2 to
ae027af
Compare
|
Ah of course, I'm not familiar with AppArmor at all but that is the SELinux equivalent on some OSes. |
|
This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days). |
|
Is anyone available to review this? |
|
@rptaylor thanks for your patience, added some review comments for @ckotzbauer, I think this generally looks like a great evolution, thanks this work! |
|
Sorry for the very long delay, this is ready for another review @jackfrancis @rptaylor |
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2e2a527 to
544a287
Compare
|
@ckotzbauer very exciting! I can test it on a dev cluster, is there a new image available in ghcr.io/ckotzbauer/ ? |
|
Thanks for the hint: |
|
Okay refreshing my memory on what this is even about 4 months later .... ;p On a dev cluster I temporarily disabled PSP admission control for testing purposes. Then on a node I touched /var/run/reboot-required. pod logs: Eventually .... node rebooted: Does that cover everything needed? Is it good to merge now? |
|
Actually wait ... the node powered off instead of rebooting, that's not good... |
|
I tried again and the node rebooted normally. Sometimes our cloud infrastructure is a bit flakey... Anyway as long as systemd on the node reports "Received SIGRTMIN+5" it should be good. |
| // MethodSignal is used as "--reboot-method" value when rebooting with a SIGRTMIN+5 signal. | ||
| MethodSignal = "signal" | ||
|
|
||
| sigTrminPlus5 = 34 + 5 |
There was a problem hiding this comment.
Looks like the typo is still there? But otherwise this should be finally good to go!
There was a problem hiding this comment.
Sorry, I didn't catch that. Let's fix the typo. If we're going to use camel case I'd vote for:
sigRTMin
Maybe:
const (
...
sigRTMin = 34
defaultSignal = sigRTMin + 5
...
)And we should probably change the MethodSignal comment so that it doesn't imply that SIGRTMIN+5 is static (it's just a default, but the user can use whatever number within the min-max range that their OS allows). Maybe:
// MethodSignal is used as "--reboot-method" value when rebooting according to a signal in the allowed range between SIGRTMIN and SIGRTMAX as defined on the OS platform running kured
|
Thanks for testing @rptaylor, this should cover all needed changes, so this should be ready for merge. |
jackfrancis
left a comment
There was a problem hiding this comment.
lgtm
great work, thank you!
Based on #813
close #416
close #722
This PR adds a
--reboot-methodflag with "command" (default) and "signal" option. The "command" option uses the--reboot-commandon the host withnsenteras before. The new "signal" mode uses aSIGRTMIN+5signal by default against PID 1 (systemd) to reboot the node. The signal can be changed via--reboot-signalflag.With this, the kured pod runs without privileged mode.
This PR is published as docker-image (amd64 and arm64):
ghcr.io/ckotzbauer/kured:1.14.0-alpha.2Usage (based on the latest helm-chart) - Helm values.yaml: