Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packages: change systemd boot sequence #1423

Merged
merged 1 commit into from
Apr 5, 2021

Conversation

arnaldo2792
Copy link
Contributor

@arnaldo2792 arnaldo2792 commented Mar 26, 2021

Issue number:
#1392

Description of changes:

packages: change systemd boot sequence

The current systemd boot sequence is error prone on isolated scenarios like sending invalid configurations through user data. These scenarios could cause inconsistent states at the end of the boot process, which affect the mechanism to determine if a boot was successful after applying new configurations or updates.

With the current boot sequence, it is difficult to implement features that require a strict services' initialization order (a.k.a run levels).

In order to fix the problems presented above, this commit defines a new boot sequence, with three main systemd targets: preconfigured, configured and multi-user. The "host-containers" crate was modified to enable the host-container service units only when multi-user is the current default target.

The "mark-successful-boot" service unit was moved from the "multi-user" target to the "preconfigured" target, and it only depends on the "migrator" oneshot service unit since it is the only service required (for now) to determine if a boot was successful. The "mark-successful-boot" service unit used to send metrics after it was executed. However, doing so indirectly declared a strong dependency on "settings-applier" since the unit was setup to use the "proxy.env" file. The "metricdog-successful-boot" oneshot service unit was created to remove such dependency from "mark-successful-boot".

Preconfigured

This target is used to start the boot process (a.k.a default target), it is based on the "multi-user" target provided by systemd so that other basic systemd units are started during this stage. Failures in the required "oneshot" service units will cause the target to fail, stopping the boot process. The boot is marked as successful during the execution of this target, only if the migrator "oneshot" service exits successfully. Services initialized during this phase include:

  • acpid
  • chrony
  • dbus-broker
  • audit-rules
  • migrator (required)
  • metricdog-successful-boot
  • storewolf (required)
  • early-boot-config (required)
  • sundog (required)
  • settings-applier(required)

Once the target is reached (completed), the "activate-configured.service" unit will set the configured target as the default target and start it.

Configured

This target should be used to setup additional configurations in the host before services like kubernetes/ecs start. Services initialized during this phase include:

  • chronyd
  • host-containerd

Once the target is reached (completed) the "activate-multi-user.service" unit will set the multi-user target as the default target, and start it.

Multi-user

This is the final target enabled in the boot sequence. Services initialized during this phase include:

  • host-containers@*
  • docker
  • ecs
  • kubernetes

Testing done:

  • systemctl status didn't show any failures on successful boots
  • Run nginx pod/task/container
  • Admin/Control containers were enabled once the multi-user target was reached
  • Custom host container sent as user data was enabled once the multi-user target was reached
  • Custom host container created after boot was enabled after the new settings were applied
apiclient set --json \
    '{"host-containers": { "test": { "source": "docker.io/arnaldo2792/blocker:latest", "superpowered": false, "enabled": true}}}'
ps ax -o pid,args | grep host-ctr
479 /usr/bin/host-ctr run --container-id=admin --source=328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-admin:v0.6.0 --superpowered=true
480 /usr/bin/host-ctr run --container-id=control --source=328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-control:v0.4.2 --superpowered=false
481 /usr/bin/host-ctr run --container-id=custom-at-boot --source=docker.io/arnaldo2792/blocker:latest --superpowered=false
16546 /usr/bin/host-ctr run --container-id=test --source=docker.io/arnaldo2792/blocker:latest --superpowered=false
  • Failed boot with bad configurations
[settings.host-containers.admin]
enabled = true

[settings.bad]
bad = "data"
[   11.365797] early-boot-config[366]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 400 when PATCHing /settings?tx=bottlerocket-launch: Json deserialize error: unknown field `bad`, expected one of `source`, `enabled`, `superpowered`, `user-data` at line 1 column 85
[FAILED] Failed to start Bottlerocket userdata configuration system.
See 'systemctl status early-boot-config.service' for details.
[DEPEND] Dependency failed for User-specified setting generators.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Bottlerocket dynamic pre-configuration.
[DEPEND] Dependency failed for Isolates configured.target.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

packages/chrony/chronyd.service Outdated Show resolved Hide resolved
packages/libaudit/audit-rules.service Outdated Show resolved Hide resolved
packages/os/[email protected] Outdated Show resolved Hide resolved
packages/os/migrator.service Outdated Show resolved Hide resolved
packages/os/sundog.service Outdated Show resolved Hide resolved
sources/api/host-containers/src/main.rs Outdated Show resolved Hide resolved
packages/docker-engine/docker.service Outdated Show resolved Hide resolved
packages/docker-engine/docker.service Outdated Show resolved Hide resolved
packages/release/preconfigured.target Show resolved Hide resolved
packages/os/settings-applier.service Outdated Show resolved Hide resolved
@bcressey
Copy link
Contributor

Re: preconfigured section - apiserver is not a one-shot.

One question is whether systemctl restart apiserver or a stop & start causes the system to fail catastrophically because all the one-shots in "preconfigured" require it. If so, we might want to relax those dependencies to "Wants" instead.

packages/chrony/chronyd.service Outdated Show resolved Hide resolved
packages/os/metricdog-successful-boot.service Outdated Show resolved Hide resolved
packages/os/mark-successful-boot.service Outdated Show resolved Hide resolved
packages/os/mark-successful-boot.service Show resolved Hide resolved
packages/os/mark-successful-boot.service Outdated Show resolved Hide resolved
Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a handful of wording and consistency fixes.

packages/os/early-boot-config.service Outdated Show resolved Hide resolved
packages/os/send-boot-success.service Outdated Show resolved Hide resolved
packages/os/early-boot-config.service Outdated Show resolved Hide resolved
packages/os/os.spec Outdated Show resolved Hide resolved
packages/os/send-boot-success.service Outdated Show resolved Hide resolved
packages/os/sundog.service Outdated Show resolved Hide resolved
packages/release/preconfigured.target Outdated Show resolved Hide resolved
packages/release/configured.target Outdated Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

In the last forced pushed I addressed the comments left by @bcressey:

  • Typo in stoping in multiple units
  • Fixed os.spec to include "send-boot-success.service"
  • Fixed "send-boot-success.service" dependencies
  • Fixed "sundog.service" dependencies
  • Fixed "preconfigured.target" and "configured.target" descriptions

packages/os/early-boot-config.service Outdated Show resolved Hide resolved
packages/os/early-boot-config.service Outdated Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

arnaldo2792 commented Mar 30, 2021

In the last forced pushed I addressed the comments left by @bcressey:

  • Typo in "early-boot-config.service"
  • Improved documentation on why we switched from Requires to Wants in "early-boot-config.service"
  • Improved documentation on why we added the new settings in the "oneshot" units

@bcressey
Copy link
Contributor

I'm happy with the code changes, but I'm planning to pull down the branch and try some contrived scenarios to see if the system behaves as expected.

@arnaldo2792
Copy link
Contributor Author

  • Updated commit message
  • Added comments in mark-successful-boot and send-boot-success

@arnaldo2792
Copy link
Contributor Author

  • Rebase upstream
  • Add Requires=storewolf.service in early-boot-config.service

packages/os/early-boot-config.service Outdated Show resolved Hide resolved
@arnaldo2792 arnaldo2792 removed the request for review from zmrow April 2, 2021 00:01
@arnaldo2792
Copy link
Contributor Author

  • Added storewolf.service in After for early-boot-config

@arnaldo2792 arnaldo2792 removed the request for review from samuelkarp April 2, 2021 01:28
@arnaldo2792 arnaldo2792 requested review from jahkeup and webern and removed request for tjkirch April 2, 2021 01:28
Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/bin/metricdog send-boot-success will return zero even if the network is not ready. If that is the case then you would see an error in the journal. If it succeeds there would be no output in the journal unless you run it like this /usr/bin/metricdog --log-level debug send-boot-success. Other than making sure send-boot-success works, looks good to me. Nice work!

sources/api/host-containers/src/main.rs Outdated Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

  • Added --no-block to the enable_and_start command
  • Added --log-level debug to send-boot-success

The current systemd boot sequence is error prone on isolated scenarios
like sending invalid configurations through user data. These scenarios
could cause inconsistent states at the end of the boot process, which
affect the mechanism to determine if a boot was successful after
applying new configurations or updates. It is also difficult to
implement features that require service initialization order.

In order to fix the problems presented above, this commit defines a new
boot sequence, with three main systemd targets: preconfigured, configured
and multi-user.

Preconfigured

This target is used to start the boot process. Failures in any of the
required service units will cause the target to fail, stopping the boot
process. The boot is marked as successful during the execution of this
target only if the migrator "oneshot" service exists successfully.
Services initialized during this phase include:

* migrator
* mark-successful-boot
* send-boot-success
* storewolf
* apiserver
* early-boot-config
* sundog
* settings-applier

Once the target is reached (completed), the "activate-configured.service"
unit will set the configured target as the default target and start it.

Configured

This target should be used to setup additional configurations in the
host before services like kubernetes/ecs start. Services initialized
during this phase include:

* chronyd
* host-containerd

Once the target is reached (completed) the "activate-multi-user.service"
unit will set the multi-user target as the default target, and start it.

Multi-user

This is the final target enabled in the boot sequence. Services
initialized during this phase include:

* host-containers@*
* docker
* ecs
* kubernetes
@arnaldo2792
Copy link
Contributor Author

  • @webern Removed --log-level debug from send-boot-success since we don't want the entire debug logs here, I verified that send-boot-success works.

Copy link
Member

@jahkeup jahkeup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay moving forward with this - though I'd like us to use another approach (if that's possible with what systemd provides) if we run into significant issues with systemctl isolate down the road.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants