-
Notifications
You must be signed in to change notification settings - Fork 462
daemon: Add a m-c-d-firstboot.service that handles encapsulated MC #904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Add a m-c-d-firstboot.service that handles encapsulated MC #904
Conversation
dd694d3 to
636ff14
Compare
|
Vertical pod autoscaler flake looks like. |
|
/retest |
|
This looks right to me and loving the diff methods as well /approve |
|
Further progress here blocks on #662 (comment) I think since we need to handle the "4.1.0 bootimage" case no matter what. But, I also think this PR is pretty safe to land as is; there's just some prep refactoring, the end commit doesn't do anything since as noted above the |
|
Would it help to split off a separate PR with the prep commits here? |
it would 🙏 |
|
Done in #935 |
This PR has nice work done in MCD to read encapsulated MC available on nodes. It looks to me that we will need these changes in machine-config-daemon package shipped with RHCOS. It seems that this PR is going to make into 4.3+. Since we don't have a way to update bootimages, are we going to need some way to get latest machine-config-daemon binary for 4.2 cluster upgrade? Something similar to what we have been discussing for 4.1 cluster update for Day1 kargs in #798 (comment) and further comment? |
Yeah, encoding the mcd binary into Ignition is the best thing I can think of. |
|
@cgwalters Can we get this PR rebased to get it tested with latest changes? |
636ff14 to
ba7d45d
Compare
|
So #935 had some prep cleanup but a bit surprisingly this PR seems to rebase cleanly and build at least without it. It might be there was some semantic stuff necessary there, or maybe I just jumped the gun and churned the code unnecessarily. Unfortunately now I've forgotten the full context. Regardless...I think before we can really use this PR we need to land support for "MCD injection" i.e. injecting the MCD binary via Ignition. |
Split off a function to diff two MCs without reconciling; it is just cleaner. Then add an API to ask whether the diff is empty, as well as a wrapper that logs. Change the once-from path to use `nil` to mean "use a canonical empty MC" so we don't need to handle `nil` elsewhere. This will be used by future work on "early kargs" where we want to reboot only if the user provided non-default kargs.
I am getting spurious "changed" in the diff due to the difference between a `nil` array and the empty array. Someone please tell me there's a more elegant way to do this in Go...
The saga of adding "firstboot MachineConfig" in openshift#798 is getting closer. This is a small amount of code that builds on a lot of prep work that landed to add a systemd service + entrypoint in the MCD that reads the `/etc/ignition-machine-config-encapsulated.json` file. You could think of this a lot like a variant of "once-from". However, it doesn't run by default right now because the MCS still serves `/etc/pivot/image-pullspec`, and this gives us a way to "ratchet" the change as we need this new MCD code to land in RHCOS before we can use it.
ba7d45d to
40d8225
Compare
| # we can land this code and then get it built into the host. | ||
| ConditionPathExists=!/etc/pivot/image-pullspec | ||
| ConditionPathExists=/etc/ignition-machine-config-encapsulated.json | ||
| After=ignition-firstboot-complete.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was testing this locally, noticed that machine-config-daemon-firstboot.service runs when a newly created cluster reboots after applying ignition configs and updates, which leads to another reboot. It's because /etc/pivot/image-pullspec gets deleted once cluster is updated to latest machine-os-content during first boot. Should we also add something like BindsTo=ignition-firstboot-complete.service here to avoid running it in next reboot or it's an acceptable behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...we should be deleting /etc/pivot/image-pullspec during firstboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we delete /etc/pivot/image-pullspec during firstboot which happens after OS is upgraded to latest machine-os-content but machine-config-daemon-firstboot.service runs early enough and at that time image-pullspec file still exist. Once system reboots after updating OS, /etc/pivot/image-pullspec doesn't exist and /etc/ignition-machine-config-encapsulated.json exist which leads machine-config-daemon-firstboot.servicerun. That's probably because systemd After=ignition-firstboot-complete.service is not enough to not run machine-config-daemon-firstboot.service service if ignition-firstboot-complete.service service failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, though I think we should probably be deleting the /etc/ignition-machine-config-encapsulated.json file too.
But this BindsTo= approach is also fine.
Thanks for rebasing the PR!
Are we talking this in context of testing this PR by injecting machine-config-daemon binary (containing this change) through ignition config during cluster install or this is something else ? I tried to test this PR locally by generating manifests and editing existing 99_openshift-machineconfig_worker.yaml and 99_openshift-machineconfig_master.yaml files injecting machine-config-daemon into /usr/local/bin/ path and machine-config-daemon-firstboot.service file which gets successfully added to nodes, even runs and applies kargs during initial boot if I edit service file accordingly. Is it right way to test it? One issue which I am seeing with my test approach is mcd fails to apply rendered config with error |
Yep! I think that's a hard prerequisite/requirement for this.
Sounds reasonable to me!
Ah right; in general in the MCO I don't think we can easily support remote data sources in Ignition, becuase it breaks the "immutability" of the checksum in That said, I think I'd test this out via changing the MCS to serve the binary as that should be the final plan - we only need the MCD binary during firstboot - after that we've already updated to the target |
got it!
Hmm, to start with we were thinking of having kargs day1 feature available to 4.3 and onward clusters where we can have latest machine-config-daemon binary available in installer bootimage(updating machine-config-daemon package which ships with RHCOS) . Injecting mcd binary through ignition which is served by MCS was next step (which I believe will also help us to support karg day1 in 4.1/4.2 based cluster)
Right! |
|
/retest e2e-aws-scaleup-rhel7 |
|
/retest |
…boot We first want to make sure that updated machine-config-dameon package get in into bootimage and things runs as expected. Once verified, we will update m-c-d-firstboot service to run during firstboot
f839c39 to
a465088
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some fixup suggestions to simplify this code.
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes perf! simple and clean.
Both empty and nil slices are of size zero and contains nothing. Consider them as equal while doing kargs comparison in old and new MachineConfigs
cb4b39d to
aa2e045
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of the updates @sinnykumari ! They look good. =D
|
yay! all tests are passing.Thanks everyone for the review! |
|
Have done another round of cluster run with mcd binary and custom payload with latest changes, looks good to me. @cgwalters Please take a look at it, want to make sure we didn't miss anything. |
|
awesome I'll leave to others on the team to lgtm but looks good otherwise! |
|
/lgtm |
|
@cgwalters: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, ericavonb, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
The saga of adding "firstboot MachineConfig" in
#798
is getting closer. This is a small amount of code that builds on
a lot of prep work that landed to add a systemd service + entrypoint
in the MCD that reads the
/etc/ignition-machine-config-encapsulated.jsonfile. Youcould think of this a lot like a variant of "once-from".
However, it doesn't run by default right now because the MCS still
serves
/etc/pivot/image-pullspec, and this gives us a way to"ratchet" the change as we need this new MCD code to land in RHCOS
before we can use it.