Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated reconciliation of boot settings #2375

Merged

Conversation

markusboehme
Copy link
Member

@markusboehme markusboehme commented Aug 24, 2022

Issue number:

Closes #2155

Description of changes:

This PR implements what has been proposed in #2155, the addition of a new settings.boot.reboot-to-reconcile setting to automatically reboot Bottlerocket if either the kernel or systemd command line were changed during boot. At a high level, prairiedog now compares any newly written bootconfig initrd against the one already present and flags the need for a reboot. Before entering the configured systemd target, a new service unit then triggers the reboot if needed.

To reviewers, I recommend going commit by commit. The commits are fairly detailed and hopefully tell a story if followed in order.

Testing done:

  • No boot settings specified: No automatic reboot; host can be rebooted
  • Boot settings specified with reboot-to-reconcile unset: No automatic reboot; host can be rebooted and comes up with specified settings
  • Boot settings specified with reboot-to-reconcile set to false: No automatic reboot; host can be rebooted and comes up with specified settings
  • Boot settings specified with reboot-to-reconcile set to true: Automatic reboot; host comes up with specified settings
  • Boot settings specified with reboot-to-reconcile set to true and bootstrap containers: Automatic reboot after bootstrap containers have exited; host comes up with specified settings
  • Tested forward and backward migration from 1.9.2 to a faux 1.10.0 with this PR on builds of the aws-k8s-1.23 variant, both with and without reboot-to-reconcile set

Notes:

  • While this implements a new setting, I don't think a migration is needed as the setting is persisted and restored by prairiedog in the bootconfig initrd. Older versions of prairiedog will just ignore the setting. Update: A migration is needed as pointed out by @etungsten: While prairiedog may ignore the unknown setting, it will bite when trying to deserialize the data store. The current version comes with a migration.
  • If you're taking this on a test ride yourself, please see Prairiedog and the kernel may disagree about parsing boot config keys without values #2359: If you configure a boot setting that doesn't require a value, please specify the empty string for now.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@markusboehme markusboehme changed the title Feature/reboot to reconcile Automated reconciliation of boot settings Aug 24, 2022
@etungsten
Copy link
Contributor

etungsten commented Aug 24, 2022

I don't think a migration is needed as the setting is persisted and restored by prairiedog in the bootconfig initrd. Older versions of prairiedog will just ignore the setting.

We would still need a migration as we're introducing a new setting. If this new setting is not properly removed from the settings data-store on downgrade then the host will fail to boot because the settings model will not anticipate this new setting when apiserver tries to de-serialize the existing settings data-store.

@markusboehme
Copy link
Member Author

I don't think a migration is needed as the setting is persisted and restored by prairiedog in the bootconfig initrd. Older versions of prairiedog will just ignore the setting.

We would still need a migration as we're introducing a new setting. If this new setting is not properly removed from the settings data-store on downgrade then the host will fail to boot because the settings model will not anticipate this new setting when apiserver tries to de-serialize the existing settings data-store.

I see, thanks! I was under the impression all serialization/deserialization for BootSettings would be handled by prairiedog, but clearly the settings are persisted in the data store as well.

While anything under and including settings.boot would be scrubbed by the migration that introduced that setting as a prefix in v1.8.0, this doesn't take care of any rollbacks to v1.8.0, v1.9.0, etc. I will add a new AddSettingsMigration for the settings.boot.reboot-to-reconcile key.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the need to persist the reboot-to-reconcile flag within the actual bootconfig data.

Let me back up a step. As a rule I tend to find marker files distasteful, and one of my stronger architectural preferences is to favor API queries over ad-hoc files, since those files tend to create their own mini-API surfaces that aren't consistent.

(This isn't entirely binding - we use markers for early-boot-config and cfnsignal, and those can make sense if the alternative would be creating weird marker-file-like settings, like settings.marker-files.early-boot-config-ran = true, which I think would be taking the above principle too far.)

Two patterns that tend to avoid some of the downsides of applying this architectural preference:

  • if a program would be overly complicated by querying the API, or if the API isn't always available to query, it might be better to render a config file for it based on settings and just have it load its own config (see: updog, netdog).
  • if a program is writing a marker file so some other, easier to use utility can avoid querying the API, it might be better to add a sub-command to that program and wrap the execution of the utility instead (prairiedog sort of does this in spirit already by wrapping the mount command for the boot partition)

Applying that second pattern would lead me to expect a command like prairiedog reboot-if-required where we'd look at the bootconfig file we'd previously rendered, compare that with the kernel settings, and query the API to see if we need to reboot after all, and then run systemctl reboot ourselves. (I'm not saying this is right or better, just sharing where I'm coming from.)

Encoding the reconcile boolean in the bootconfig data could make sense if we want to optimize away that final API query - after all, we've already seen all the settings before and can record that fact somewhere - but then I'm not sure why we don't just write the marker file instead, if we find a delta after creating the config, or remove it if there's no longer a delta.

It's really that nested sequence of state that's not clear to me - why we have API settings and bootconfig and the marker file, and if we can do away with one or both of the latter bits of state.

packages/os/reboot-to-reconcile.service Outdated Show resolved Hide resolved
packages/os/reboot-to-reconcile.service Outdated Show resolved Hide resolved
@markusboehme
Copy link
Member Author

I don't quite understand the need to persist the reboot-to-reconcile flag within the actual bootconfig data.

Originally, this is mainly a result of misunderstanding the serialization of BootSettings, i.e. assuming they would only be persisted by prairiedog. Erikson clarified this in his earlier comment. There should be nothing in the way of dropping that setting from the bootconfig data again. When invoked by sundog, prairiedog generate-boot-settings will have to query the API for the current data and merge the kernel and init settings read from the bootconfig data into it. I will try this next--I expect it will get rid of quite a bit of extra code.

Let me back up a step. As a rule I tend to find marker files distasteful, and one of my stronger architectural preferences is to favor API queries over ad-hoc files, since those files tend to create their own mini-API surfaces that aren't consistent. [...]

It's really that nested sequence of state that's not clear to me - why we have API settings and bootconfig and the marker file, and if we can do away with one or both of the latter bits of state.

Thanks for taking the time to share your thoughts on the overall design! I think a part of the confusion stems from trying to see deeper reasons in having reboot-to-reconcile in the bootconfig datat when I thought of it as just a necessity. 😊 Not doing that anymore will likely make things clearer.

I had considered, but not implemented, the idea of a new prairiedog command. Decoupling the decision to reboot and actually rebooting via a marker file was motivated by the idea a bootstrap container might find other reasons to request a reboot before the host enters its normal operational state. There is no such request for now, so perhaps it makes sense to drop this and go with the specific solution that keeps all logic inside prairiedog.

@bcressey
Copy link
Contributor

Decoupling the decision to reboot and actually rebooting via a marker file was motivated by the idea a bootstrap container might find other reasons to request a reboot before the host enters its normal operational state. There is no such request for now, so perhaps it makes sense to drop this and go with the specific solution that keeps all logic inside prairiedog.

I was thinking that an alternative using helper commands might be a oneshot unit with multiple potential restart commands, e.g.:

ExecStart=/usr/bin/prairiedog reboot-if-required
ExecStart=/usr/bin/netdog reboot-if-required
ExecStart=/usr/bin/updog reboot-if-required

Where each program gets to encapsulate its own state checks and decision-making.

But your use case - allowing bootstrap containers to "schedule" a reboot after all other bootstrap containers have run, by writing a marker file in a well-known place - makes sense too, and wouldn't be possible with the sequenced helper commands.

I'm happy to keep the /run/reboot-required approach.

@markusboehme
Copy link
Member Author

@bcressey I see what you mean now. This definitely looks and feels more as if made from one piece, and doesn't preclude bootstrap containers from signalling a need to reboot either: There likewise could be a /usr/bin/bootstrap-containers reboot-if-required, checking e.g. a marker file that is local to a particular bootstrap container. This even would provide more flexibility (e.g. accountability, or honoring the request only for certain containers). I'm thus moving away from the global marker file in the next iteration of the PR.

@bcressey
Copy link
Contributor

There likewise could be a /usr/bin/bootstrap-containers reboot-if-required, checking e.g. a marker file that is local to a particular bootstrap container.

Nice! I really like this idea.

@markusboehme
Copy link
Member Author

The force push changes the approach as discussed above:

  • The reboot now happens after reaching the configured.target, sidestepping the need to explicitly consider bootstrap containers and whether they're essential or not.
  • The initrd boot config now only contains the kernel and init command line arguments. The new reboot-to-reconcile setting is only persisted in the data store.
  • prairiedog grew a new reboot-if-required command to compare the initrd boot config against the boot config the host booted with, issuing a reboot if that is required and settings.boot.reboot-to-reconcile is set to true. Other first-party components can likewise introduce reboot-if-required commands of their own and hook into the existing reboot-if-required.service.
  • The reboot-if-required setting now comes with the migration needed for a rollback and has been tested.

There's merge conflicts to sort out in Cargo.toml and Release.toml needs merging as well. I refrained from rebasing for now to allow for easy diffing against the older revision. Will rebase to unblock the test process once folks have signaled they're in agreement with the current approach.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines 278 to 283
if new_boot_settings.reboot_to_reconcile.unwrap_or(false) {
/// Determines whether two optional hash maps changed. Treats empty and missing hash maps as equal.
fn has_changed_materially(
old_params: &Option<HashMap<BootConfigKey, Vec<BootConfigValue>>>,
new_params: &Option<HashMap<BootConfigKey, Vec<BootConfigValue>>>,
) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth splitting this out into a helper function that takes two BootSettings and returns a bool, so that it can be unit-tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point! I've done so now, and cleaned up a couple existing tests in the process (in a separate commit, of course).

Comment on lines 284 to 289
match (old_params, new_params) {
(None, None) => false,
(None, Some(new)) => !new.is_empty(),
(Some(old), None) => !old.is_empty(),
(Some(old), Some(new)) => old != new,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two middle cases were what triggered my "oh, I hadn't thought of that" reaction and prompted me to look for unit tests. 😀

{
if is_reboot_required(socket_path).await? {
info!("Boot settings changed and require a reboot to take effect. Initiating reboot...");
command("/usr/bin/systemctl", &["reboot"])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we lose the property from the previous revision where the system would never continue starting other units?

The manual for systemctl reboot says it's mostly equivalent to:

systemctl start reboot.target --job-mode=replace-irreversibly --no-block

It might be good to somehow confirm that activate-multi-user.target never gets a chance to run. I'd guess that since it has DefaultDependencies=yes (implied) then it will conflict with shutdown.target and never actually start because the stop job will run first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did, for the reason that Arnaldo noticed below: I accidentally changed the systemd service type to one that wouldn't wait for anything but a successful fork. Once systemctl reboot returns, from looking at the code (and later on logs as well) the system would be working on the shutdown already. I realize this is all a bit subtle, though, so I also added an explicit loop that'll wait for systemd to send prairiedog a SIGTERM and adds a second line of defense together with the Type=oneshot.

Copy link
Contributor

@arnaldo2792 arnaldo2792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

packages/os/os.spec Show resolved Hide resolved
@@ -0,0 +1,10 @@
[Unit]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want this unit to be a oneshot and configured with RemainAfterExit, right? According to the docs, since the unit has ExecStart it won't be considered oneshot by default, and the behavior of a simple service doesn't quite fit for reboot-if-required:

the service manager will consider the unit started immediately after the main service process has been forked off.
...
the service manager will immediately proceed starting follow-up units, right after creating the main service process, and before executing the service's binary. Note that this means systemctl start command lines for simple services will report success even if the service's binary cannot be invoked successfully

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch, thanks! Previously, there was a brief window of time in which the boot would proceed, which practically speaking probably wouldn't be problematic, but still is super undesirable. I set Type=oneshot explicitly now, and the debug logs confirm the mechanism is working as intended:

reboot-if-required.service: starting held back, waiting for: configured.target
activate-multi-user.service: starting held back, waiting for: reboot-if-required.service
reboot-if-required.service: About to execute /usr/bin/prairiedog reboot-if-required
reboot-if-required.service: Changed dead -> start
[   13.305648] prairiedog[1163]: 19:35:00 [INFO] Boot settings changed and require a reboot to take effect. Initiating reboot...
Unit reboot.target has alias ctrl-alt-del.target.
reboot.target: Trying to enqueue job reboot.target/start/replace-irreversibly
shutdown.target: Installed new job shutdown.target/start as 472
reboot-if-required.service: Job 425 reboot-if-required.service/start finished, result=canceled
reboot-if-required.service: Installed new job reboot-if-required.service/stop as 510
activate-multi-user.service: Job 424 activate-multi-user.service/start finished, result=canceled
activate-multi-user.service: Installed new job activate-multi-user.service/stop as 452
reboot.target: Installed new job reboot.target/start as 426

@markusboehme
Copy link
Member Author

The latest force push contains these changes:

  • reboot-if-required.service now is Type=oneshot as it should have been, waiting for prairiedog to issue the reboot
  • Extracted a function to actually compare the boot settings for reboot-worthy changes, now coming with dedicated unit tests

I'll rebase onto develop and resolve merge conflicts once there's no need to diff against the previous revisions anymore. This will also unblock the automated testing for this PR.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Add a new boolean settings.boot.reboot-to-reconcile to govern whether
Bottlerocket should automatically reboot if kernel or systemd command
line parameters are reconfigured during boot. This could happen either
via user-data or via a bootstrap container. In either case, command line
changes for the kernel or systemd will not take effect until the next
reboot.

This change only introduces the new setting to the model and threads it
through in all places touching BootSettings. The flag is not yet armed,
i.e. no reboot action is taken.

Signed-off-by: Markus Boehme <[email protected]>
A previous commit introduced the new "settings.boot.reboot-to-reconcile"
setting to the BootSettings model. Add a migration to allow rolling back
to a version of Bottlerocket without the new setting on a host that has
it set.

Signed-off-by: Markus Boehme <[email protected]>
Refactoring, no change in behavior. The code to read /proc/bootconfig
will be useful later to determine whether the system needs to be
rebooted to reconcile boot settings and the current boot config.

Signed-off-by: Markus Boehme <[email protected]>
The unit tests have needs for instances of the BootSettings model.
Some tests already use repetitive and involved code to convert regular
hash maps into the modeled types needed for creating BootSettings.
Extract that code into its dedicated helper function so that the tests
can remain focused on the actual test data and logic.

Signed-off-by: Markus Boehme <[email protected]>
When boot settings change, "prairiedog generate-boot-config" is invoked
to regenerate the initrd containing the boot config. However, this does
not mean that any changes to e.g. the kernel or systemd command lines
take effect yet. A reboot is needed for this.

Add a new "prairiedog reboot-if-required" command that compares the
current boot settings (and hence the currently persisted boot config in
the initrd) against the one the host booted with to detect any changes
requiring a reboot. If such changes are detected and the
"settings.boot.reboot-to-reconcile" flag is set, reboot the host.

For now, only changes to the kernel and systemd command lines require a
reboot. Bottlerocket-specific boot settings like the newly introduced
"reboot-to-reconcile" may change freely and will not be flagged by
prairiedog.

Signed-off-by: Markus Boehme <[email protected]>
Reboot the system towards the end of the configuration phase during boot
if any of the first-party components deem this necessary. The reboot is
enacted after reaching the configured.target and before reaching the
multi-user.target, so e.g. user data has been applied and bootstrap
containers have already run, but the host is not yet fully operational.

Components can hook into the new reboot-if-required.service to be able
to run at this point. They should implement a "reboot-if-required"
command to trigger a reboot if it is needed.

Currently, prairiedog is the only component hook into this service and
implement a "reboot-if-required" command. It will trigger a reboot if
boot settings have changed, i.e. if the kernel or systemd command line
arguments were changed via the user data mechanism or a bootstrap
container.

Signed-off-by: Markus Boehme <[email protected]>
@markusboehme
Copy link
Member Author

Thanks for the reviews! I force-pushed now to resolve the conflicts in Cargo.toml and Release.toml, unblocking the automated tests.

@markusboehme markusboehme added this to the 1.10.0 milestone Sep 21, 2022
@markusboehme markusboehme merged commit aa0087f into bottlerocket-os:develop Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support automated reconciliation of kernel command line settings
4 participants