-
Notifications
You must be signed in to change notification settings - Fork 463
daemon: selectively reboot based on diffs of applied MCs #2259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: selectively reboot based on diffs of applied MCs #2259
Conversation
Today MCO defaults to rebooting node in order to apply successfully changes to the node. This is the safest way to apply a change on the node as we don't have to worry about things like which change is safe to skip reboot. In certain environments, rebooting is expensive and due to complex hardware setup sometimes can cause problems and node won't boot up. This will help to avoid rebooting nodes for certain cases where MCO thinks it is safe to do so. See - openshift/enhancements#159
Add functionality to calculate the set of diffs between the existing and new rendered config when an update happens, and take action based on the diff. Currently supported actions are: reboot - default behaviour reload crio - when registries.conf is changed none - when pull secret or ssh key is changed Also rename rebootAction to postConfigChangeActionNone, although that name is up for debate.
326df48 to
81476d4
Compare
pkg/daemon/update.go
Outdated
| dn.logSystem("Starting update from %s to %s: %+v", oldConfigName, newConfigName, diff) | ||
|
|
||
| // TODO: consider how we should honor "force" flag here. Maybe if force, always reboot? | ||
| // TODO: consider if we should not cordon if no action needs to be taken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing multiple things when we call updateFiles() irrespective of what rebootAction we perform like deleting stale file, writing units from newIgnConfig, etc. There are chances of getting errors in these steps. Cordon and drain will ensure that we don't end up with degraded node that has running workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok, I think my consideration was that drain on some workloads (e.g. CI) take quite awhile so it would be a nice value add. And if we were doing an update that does not require a reboot, it was unlikely to break the state of the actual node (at worst, it would e.g. fail a file write, but would not commit any changes to disk and the node itself should still be working even as we're degrading the pool).
For safety purposes though perhaps we should do this drain method for now, and consider improving in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, even I get tempted to skip drain which will work fine during happy path. This is probably an extra precaution to avoid coroner cases bugs that bite us later on.
+1 on revisiting drain optimization later on (probably in 4.8?)
deleted accidently while splitting checkStateOnFirstRun()
|
Tested all scenarios we are handling in this PR including node scaleup, working perfectly fine. |
|
/retest |
|
Fixed above error case and added some initial unit tests. Will expand. Manual testing looks good so far. |
aa60564 to
8f4a38e
Compare
Add unit tests for different changes in the MachineConfig, to test if the calculated post config change action is expected.
8f4a38e to
e443af7
Compare
|
Added e2e test |
8886dfe to
aa995ba
Compare
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What change would result in no action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSH keys + pull secret. I'll add some comments here.
"No action" is perhaps not the right way to frame it, we still cordon, drain + write files/units/etc. We just don't reload any unit and don't reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe postConfigChangeActionSkipReboot? tho i think the none also works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think e.g. reloading crio also "skips reboot", so I defaulted to "none" to mean "no extra action needed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, im fine with none 👍
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the massive benefits of systemd over baseline Unix is that by using cgroups, it applies a rigorous structure to the prior total Wild West of Unix processes you could identify by grepping their name or maybe racy PID files.
In particular, this invocation will send SIGHUP to e.g. a process running in a regular user pod if it happens to be named "crio".
What you want instead is systemctl reload crio - systemd is actively tracking the PID that it launched and can send the signal in a race-free way to exactly the correct process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah for some reason I thought crio didn't have it defined. I can confirm that it indeed should be doing the same thing:
# systemctl cat crio.service
...
ExecReload=/bin/kill -s HUP $MAINPID
...
And running a reload:
Nov 30 23:23:17 ip-10-0-131-15 systemd[1]: Reloading Open Container Initiative Daemon.
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.304012464Z" level=info msg="Reloading configuration"
Nov 30 23:23:17 ip-10-0-131-15 systemd[1]: Reloaded Open Container Initiative Daemon.
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.309633738Z" level=info msg="Updating config from file /etc/crio/crio.conf.d/0-default"
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.309845337Z" level=info msg="Updating config from path /etc/crio/crio.conf.d"
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310093463Z" level=info msg="Applied new registry configuration: &{Registries:] UnqualifiedSearchRegistries:[registry.access.redhat.com docker.io]}"
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310131904Z" level=info msg="No seccomp profile specified, using the internal efault"
Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310144510Z" level=info msg="Set config seccomp_profile to \"\""
Seems like its doing what we want. I'll defer to @sinnykumari since she is more knowledgeable on whether that is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't know crio reload already does the same thing, definitely nicer way to go.
Useful learning and something I will keep in mind :)
|
/retest
|
yeah, I don't know why it is failing in ci and works fine when running |
|
/retest |
aa995ba to
9223c47
Compare
|
/retest |
…esent Also, added event for crio service failure and updated event reasoning from Reboot to SkipReboot
|
flake due to VpcLimitExceeded |
|
Need final round of review and approval get it merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a medium-level review and I think it looks good, but there are a lot of subtleties here and it touches code paths that we don't necessarily have CI coverage.
On the flip side, I think this is pretty safe to land because it will only kick in in very specialized circumstances. We're very unlikely to break any default installs or CI.
That said, I do have one request (I wouldn't call this a blocker but a very nice to have): Let's add a new machineconfiguration.openshift.io/bootedConfig annotation that is set the first time we do a "live" update.
If you look at the recent rpm-ostree livefs work which is quite analogous to this, if you type rpm-ostree status you can clearly see that a "live" update was applied, as distinct from the booted commit.
Now currently nothing in rpm-ostree tries to capture the full "live history", and I don't think we need to do that here either. The goal is that it should be immediately obvious looking at a node if it's in a "live applied" state. That will help us debug problems in the future.
To clarify...anyone else should feel free to add a LGTM as is, but hopefully you agree with the above and we can try to add a separate bootedConfig as a followup soon.
|
If you haven't tried
Then when a livefs is applied, there's a new "livefs commit" (which is the same as staged, but note that staged could change without live changing later, so they need to be separately tracked). |
|
I think @cgwalters suggestion could be a good followup for debugability (is that a word?) @runcom any final thoughts? |
| // move to desired state without additional validation. We will reboot the node in | ||
| // this case regardless of what MachineConfig diff is. | ||
| if _, err := os.Stat(constants.MachineConfigDaemonForceFile); err == nil { | ||
| if err := os.Remove(constants.MachineConfigDaemonForceFile); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this does change the behaviour of how the forcefile works slightly, but I guess its probably for the better, since this way it stays on the system until an update happens. Off the top of my head I don't think this should cause a problem. So +1
|
Ok I think this is good to go! Let's aim to improve documentation and debugability in follow up PRs. Thanks everyone for the reviews! /retest |
|
let's do this. /lgtm |
|
/lgtm |
|
/skip |
|
/test e2e-aws |
|
/test e2e-aws-serial |
1 similar comment
|
/test e2e-aws-serial |
|
/test e2e-aws |
|
@yuqi-zhang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/test e2e-aws |
|
everything passed! but now... tide is gonna... retest 😐 |
That is indeed as @kikisdeliveryservice commented a great thing to have for debuggability - we'll still must-gather in the cases we're investigating our usual bugs but it'll definitely help direct the investigation the right way. We can definitely follow up! /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, kikisdeliveryservice, runcom, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Nice and useful suggestion, definitely we will discuss and work on adding it. |
Adds to/supercedes #2254
This is the overall PR for the reboot epic work, which aims to allow "rebootless updates" when certain changes happen.
Currently supported:
No action (drain, update files, uncordon)
Reload crio
Reboot
Also add corresponding unit and e2e tests