daemon: selectively reboot based on diffs of applied MCs #2259

yuqi-zhang · 2020-11-26T04:26:06Z

Adds to/supercedes #2254

This is the overall PR for the reboot epic work, which aims to allow "rebootless updates" when certain changes happen.

Currently supported:
No action (drain, update files, uncordon)

ssh keys
pull secret

Reload crio

registries.conf changes

Reboot

all other scenarios

Also add corresponding unit and e2e tests

Today MCO defaults to rebooting node in order to apply successfully changes to the node. This is the safest way to apply a change on the node as we don't have to worry about things like which change is safe to skip reboot. In certain environments, rebooting is expensive and due to complex hardware setup sometimes can cause problems and node won't boot up. This will help to avoid rebooting nodes for certain cases where MCO thinks it is safe to do so. See - openshift/enhancements#159

Add functionality to calculate the set of diffs between the existing and new rendered config when an update happens, and take action based on the diff. Currently supported actions are: reboot - default behaviour reload crio - when registries.conf is changed none - when pull secret or ssh key is changed Also rename rebootAction to postConfigChangeActionNone, although that name is up for debate.

pkg/daemon/update.go

sinnykumari · 2020-11-26T14:17:16Z

pkg/daemon/update.go

 	dn.logSystem("Starting update from %s to %s: %+v", oldConfigName, newConfigName, diff)

+	// TODO: consider how we should honor "force" flag here. Maybe if force, always reboot?
+	// TODO: consider if we should not cordon if no action needs to be taken


We are doing multiple things when we call updateFiles() irrespective of what rebootAction we perform like deleting stale file, writing units from newIgnConfig, etc. There are chances of getting errors in these steps. Cordon and drain will ensure that we don't end up with degraded node that has running workload.

Hmm ok, I think my consideration was that drain on some workloads (e.g. CI) take quite awhile so it would be a nice value add. And if we were doing an update that does not require a reboot, it was unlikely to break the state of the actual node (at worst, it would e.g. fail a file write, but would not commit any changes to disk and the node itself should still be working even as we're degrading the pool).
For safety purposes though perhaps we should do this drain method for now, and consider improving in the future?

You are right, even I get tempted to skip drain which will work fine during happy path. This is probably an extra precaution to avoid coroner cases bugs that bite us later on.
+1 on revisiting drain optimization later on (probably in 4.8?)

deleted accidently while splitting checkStateOnFirstRun()

sinnykumari · 2020-11-26T16:58:31Z

Tested all scenarios we are handling in this PR including node scaleup, working perfectly fine.

sinnykumari · 2020-11-26T17:05:23Z

/retest

yuqi-zhang · 2020-11-26T23:19:46Z

Fixed above error case and added some initial unit tests. Will expand.

Manual testing looks good so far.

Add unit tests for different changes in the MachineConfig, to test if the calculated post config change action is expected.

sinnykumari · 2020-11-30T13:31:38Z

Added e2e test

cgwalters · 2020-11-30T21:06:51Z

pkg/daemon/update.go

What change would result in no action?

SSH keys + pull secret. I'll add some comments here.

"No action" is perhaps not the right way to frame it, we still cordon, drain + write files/units/etc. We just don't reload any unit and don't reboot.

maybe postConfigChangeActionSkipReboot? tho i think the none also works

I think e.g. reloading crio also "skips reboot", so I defaulted to "none" to mean "no extra action needed"

good point, im fine with none 👍

cgwalters · 2020-11-30T21:09:25Z

pkg/daemon/update.go

One of the massive benefits of systemd over baseline Unix is that by using cgroups, it applies a rigorous structure to the prior total Wild West of Unix processes you could identify by grepping their name or maybe racy PID files.

In particular, this invocation will send SIGHUP to e.g. a process running in a regular user pod if it happens to be named "crio".

What you want instead is systemctl reload crio - systemd is actively tracking the PID that it launched and can send the signal in a race-free way to exactly the correct process.

Ah for some reason I thought crio didn't have it defined. I can confirm that it indeed should be doing the same thing:

# systemctl cat crio.service ... ExecReload=/bin/kill -s HUP $MAINPID ...

And running a reload:

Nov 30 23:23:17 ip-10-0-131-15 systemd[1]: Reloading Open Container Initiative Daemon. Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.304012464Z" level=info msg="Reloading configuration" Nov 30 23:23:17 ip-10-0-131-15 systemd[1]: Reloaded Open Container Initiative Daemon. Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.309633738Z" level=info msg="Updating config from file /etc/crio/crio.conf.d/0-default" Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.309845337Z" level=info msg="Updating config from path /etc/crio/crio.conf.d" Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310093463Z" level=info msg="Applied new registry configuration: &{Registries:] UnqualifiedSearchRegistries:[registry.access.redhat.com docker.io]}" Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310131904Z" level=info msg="No seccomp profile specified, using the internal efault" Nov 30 23:23:17 ip-10-0-131-15 crio[1507]: time="2020-11-30 23:23:17.310144510Z" level=info msg="Set config seccomp_profile to \"\""

Seems like its doing what we want. I'll defer to @sinnykumari since she is more knowledgeable on whether that is correct.

Ah, didn't know crio reload already does the same thing, definitely nicer way to go.
Useful learning and something I will keep in mind :)

yuqi-zhang · 2020-12-01T00:05:26Z

/retest

uptimeOld, err := strconv.ParseFloat(oldTime, 64) seems to have failed in the run but works locally, trying again just in case

sinnykumari · 2020-12-01T08:08:29Z

/retest

uptimeOld, err := strconv.ParseFloat(oldTime, 64) seems to have failed in the run but works locally, trying again just in case

yeah, I don't know why it is failing in ci and works fine when running make test-e2e on a local cluster. This is making debugging difficult. From one of the ci run where I added debug message, it printed:

 mcd_test.go:193: DEBUG: Uptime file content: I1130 18:29:05.151521    8238 request.go:645] Throttling request took 1.019506334s, request: GET:https://api.ci-op-fpxq6893-1354f.origin-ci-int-gce.dev.openshift.com:6443/apis/storage.k8s.io/v1beta1?timeout=32s

sinnykumari · 2020-12-01T08:29:32Z

/retest

test/e2e/mcd_test.go

sinnykumari · 2020-12-02T09:41:39Z

/retest

…esent Also, added event for crio service failure and updated event reasoning from Reboot to SkipReboot

sinnykumari · 2020-12-02T14:42:08Z

flake due to VpcLimitExceeded
/retest

sinnykumari · 2020-12-02T15:39:23Z

Need final round of review and approval get it merged.
/assign @cgwalters @runcom

cgwalters

I gave this a medium-level review and I think it looks good, but there are a lot of subtleties here and it touches code paths that we don't necessarily have CI coverage.

On the flip side, I think this is pretty safe to land because it will only kick in in very specialized circumstances. We're very unlikely to break any default installs or CI.

That said, I do have one request (I wouldn't call this a blocker but a very nice to have): Let's add a new machineconfiguration.openshift.io/bootedConfig annotation that is set the first time we do a "live" update.

If you look at the recent rpm-ostree livefs work which is quite analogous to this, if you type rpm-ostree status you can clearly see that a "live" update was applied, as distinct from the booted commit.

Now currently nothing in rpm-ostree tries to capture the full "live history", and I don't think we need to do that here either. The goal is that it should be immediately obvious looking at a node if it's in a "live applied" state. That will help us debug problems in the future.

To clarify...anyone else should feel free to add a LGTM as is, but hopefully you agree with the above and we can try to add a separate bootedConfig as a followup soon.

cgwalters · 2020-12-02T18:30:42Z

If you haven't tried rpm-ostree ex livefs, here's a demo in current Fedora CoreOS. There's a strong analogy between:

"booted deployment" == "currentConfig"
"staged deployment" == "desiredConfig"

Then when a livefs is applied, there's a new "livefs commit" (which is the same as staged, but note that staged could change without live changing later, so they need to be separately tracked).

walters@toolbox /v/s/w/b/fcos> cosa run
...
[root@cosa-devsh ~]# rpm-ostree install strace
...
Will download: 1 package (1.2?MB)
...
Added:
  strace-5.9-1.fc33.x86_64
Run "systemctl reboot" to start a reboot
[root@cosa-devsh ~]# rpm-ostree status
State: idle
Deployments:
  ostree://fedora:fedora/x86_64/coreos/testing-devel
                   Version: 33.20201119.dev.0 (2020-11-19T22:04:26Z)
                BaseCommit: 6d5b62bd02745cffbc4f3b470196eed8207fffc674c6602c428f632c226e5f2e
              GPGSignature: (unsigned)
                      Diff: 1 added
           LayeredPackages: strace

* ostree://fedora:fedora/x86_64/coreos/testing-devel
                   Version: 33.20201119.dev.0 (2020-11-19T22:04:26Z)
                    Commit: 6d5b62bd02745cffbc4f3b470196eed8207fffc674c6602c428f632c226e5f2e
              GPGSignature: (unsigned)
[root@cosa-devsh ~]# rpm-ostree ex livefs
[root@cosa-devsh ~]# rpm-ostree status
State: idle
Deployments:
  ostree://fedora:fedora/x86_64/coreos/testing-devel
                   Version: 33.20201119.dev.0 (2020-11-19T22:04:26Z)
                BaseCommit: 6d5b62bd02745cffbc4f3b470196eed8207fffc674c6602c428f632c226e5f2e
                    Commit: 0c398101141bfbf6c3661dd0d237622b865e2b85966007960e05a71cc7fcf1df
              GPGSignature: (unsigned)
                      Diff: 1 added
           LayeredPackages: strace

* ostree://fedora:fedora/x86_64/coreos/testing-devel
                   Version: 33.20201119.dev.0 (2020-11-19T22:04:26Z)
              BootedCommit: 6d5b62bd02745cffbc4f3b470196eed8207fffc674c6602c428f632c226e5f2e
                LiveCommit: 0c398101141bfbf6c3661dd0d237622b865e2b85966007960e05a71cc7fcf1df
                  LiveDiff: 1 added
              GPGSignature: (unsigned)
                  Unlocked: transient
[root@cosa-devsh ~]#

kikisdeliveryservice · 2020-12-02T18:40:54Z

I think @cgwalters suggestion could be a good followup for debugability (is that a word?)

@runcom any final thoughts?

yuqi-zhang · 2020-12-02T20:24:20Z

pkg/daemon/update.go

+	// move to desired state without additional validation. We will reboot the node in
+	// this case regardless of what MachineConfig diff is.
+	if _, err := os.Stat(constants.MachineConfigDaemonForceFile); err == nil {
+		if err := os.Remove(constants.MachineConfigDaemonForceFile); err != nil {


So this does change the behaviour of how the forcefile works slightly, but I guess its probably for the better, since this way it stays on the system until an update happens. Off the top of my head I don't think this should cause a problem. So +1

yuqi-zhang · 2020-12-02T20:26:30Z

Ok I think this is good to go! Let's aim to improve documentation and debugability in follow up PRs. Thanks everyone for the reviews!

/retest

kikisdeliveryservice · 2020-12-02T20:29:45Z

let's do this.

/lgtm

cgwalters · 2020-12-02T20:29:50Z

/lgtm

kikisdeliveryservice · 2020-12-02T22:57:17Z

/skip

kikisdeliveryservice · 2020-12-02T22:58:01Z

/test e2e-aws

kikisdeliveryservice · 2020-12-03T01:09:01Z

/test e2e-aws-serial

kikisdeliveryservice · 2020-12-03T02:09:49Z

/test e2e-aws-serial

kikisdeliveryservice · 2020-12-03T02:55:28Z

/test e2e-aws

openshift-merge-robot · 2020-12-03T03:33:23Z

@yuqi-zhang: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/okd-e2e-aws	`e25a613`	link	`/test okd-e2e-aws`
ci/prow/e2e-aws-workers-rhel7	`e25a613`	link	`/test e2e-aws-workers-rhel7`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kikisdeliveryservice · 2020-12-03T04:08:03Z

/test e2e-aws

kikisdeliveryservice · 2020-12-03T05:54:16Z

everything passed! but now... tide is gonna... retest 😐

runcom · 2020-12-03T07:03:08Z

To clarify...anyone else should feel free to add a LGTM as is, but hopefully you agree with the above and we can try to add a separate bootedConfig as a followup soon.

That is indeed as @kikisdeliveryservice commented a great thing to have for debuggability - we'll still must-gather in the cases we're investigating our usual bugs but it'll definitely help direct the investigation the right way. We can definitely follow up!

/lgtm

openshift-ci-robot · 2020-12-03T07:03:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, runcom, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,kikisdeliveryservice,runcom,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sinnykumari · 2020-12-03T08:12:47Z

hopefully you agree with the above and we can try to add a separate bootedConfig as a followup soon.

Nice and useful suggestion, definitely we will discuss and work on adding it.

yuqi-zhang requested a review from sinnykumari November 26, 2020 04:26

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2020

openshift-ci-robot requested a review from cgwalters November 26, 2020 04:26

yuqi-zhang mentioned this pull request Nov 26, 2020

daemon: skip or perform node reboot based on rebootAction #2254

Closed

yuqi-zhang force-pushed the selective-reboot branch from 326df48 to 81476d4 Compare November 26, 2020 04:32

sinnykumari reviewed Nov 26, 2020

View reviewed changes

pkg/daemon/update.go Outdated Show resolved Hide resolved

sinnykumari reviewed Nov 26, 2020

View reviewed changes

daemon: store currentConfig on disk during bootstrap

45a1030

deleted accidently while splitting checkStateOnFirstRun()

sinnykumari requested a review from runcom November 26, 2020 17:05

yuqi-zhang force-pushed the selective-reboot branch 2 times, most recently from aa60564 to 8f4a38e Compare November 27, 2020 21:17

Add unit test coverage for selective reboot cases

e443af7

Add unit tests for different changes in the MachineConfig, to test if the calculated post config change action is expected.

yuqi-zhang force-pushed the selective-reboot branch from 8f4a38e to e443af7 Compare November 27, 2020 21:23

sinnykumari force-pushed the selective-reboot branch 2 times, most recently from 8886dfe to aa995ba Compare November 30, 2020 19:44

cgwalters reviewed Nov 30, 2020

View reviewed changes

cgwalters mentioned this pull request Nov 30, 2020

Gangplank: ensure minio always dies coreos/coreos-assembler#1917

Merged

yuqi-zhang added the team-mco label Nov 30, 2020

kikisdeliveryservice reviewed Dec 1, 2020

View reviewed changes

test/e2e/mcd_test.go Outdated Show resolved Hide resolved

sinnykumari force-pushed the selective-reboot branch from aa995ba to 9223c47 Compare December 1, 2020 10:43

daemon: use systemctl reload for crio config update

85ac6c0

daemon: always set postConfigChangeActionReboot when force file is pr…

e25a613

…esent Also, added event for crio service failure and updated event reasoning from Reboot to SkipReboot

openshift-ci-robot assigned cgwalters and runcom Dec 2, 2020

cgwalters approved these changes Dec 2, 2020

View reviewed changes

yuqi-zhang commented Dec 2, 2020

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2020

openshift-ci-robot assigned kikisdeliveryservice Dec 2, 2020

openshift-merge-robot merged commit 42f98e6 into openshift:master Dec 3, 2020

kikisdeliveryservice mentioned this pull request Dec 17, 2020

ImageContentSourcePolicy restarts all worker nodes. #1342

Closed

corburn mentioned this pull request Mar 22, 2021

GPG Public Key Rotation Reboot Exception #2481

Closed

tonyskapunk mentioned this pull request Jul 16, 2021

Add mirroring role for example-cnf dci-labs/example-cnf-config#25

Merged

daemon: selectively reboot based on diffs of applied MCs #2259

daemon: selectively reboot based on diffs of applied MCs #2259

Uh oh!

Conversation

yuqi-zhang commented Nov 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sinnykumari commented Nov 26, 2020

Uh oh!

sinnykumari commented Nov 26, 2020

Uh oh!

yuqi-zhang commented Nov 26, 2020

Uh oh!

sinnykumari commented Nov 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang commented Dec 1, 2020

Uh oh!

sinnykumari commented Dec 1, 2020

Uh oh!

sinnykumari commented Dec 1, 2020

Uh oh!

Uh oh!

sinnykumari commented Dec 2, 2020

Uh oh!

sinnykumari commented Dec 2, 2020

Uh oh!

sinnykumari commented Dec 2, 2020

Uh oh!

cgwalters left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kikisdeliveryservice commented Dec 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang commented Dec 2, 2020

Uh oh!

kikisdeliveryservice commented Dec 2, 2020

Uh oh!

cgwalters commented Dec 2, 2020

Uh oh!

kikisdeliveryservice commented Dec 2, 2020

Uh oh!

kikisdeliveryservice commented Dec 2, 2020

Uh oh!

kikisdeliveryservice commented Dec 3, 2020

Uh oh!

kikisdeliveryservice commented Dec 3, 2020

Uh oh!

kikisdeliveryservice commented Dec 3, 2020

Uh oh!

openshift-merge-robot commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yuqi-zhang commented Nov 26, 2020 •

edited

Loading

cgwalters left a comment •

edited

Loading

cgwalters commented Dec 2, 2020 •

edited

Loading

openshift-merge-robot commented Dec 3, 2020 •

edited

Loading