packages: change systemd boot sequence #1423

arnaldo2792 · 2021-03-26T03:51:13Z

Issue number:
#1392

Description of changes:

packages: change systemd boot sequence

The current systemd boot sequence is error prone on isolated scenarios like sending invalid configurations through user data. These scenarios could cause inconsistent states at the end of the boot process, which affect the mechanism to determine if a boot was successful after applying new configurations or updates.

With the current boot sequence, it is difficult to implement features that require a strict services' initialization order (a.k.a run levels).

In order to fix the problems presented above, this commit defines a new boot sequence, with three main systemd targets: preconfigured, configured and multi-user. The "host-containers" crate was modified to enable the host-container service units only when multi-user is the current default target.

The "mark-successful-boot" service unit was moved from the "multi-user" target to the "preconfigured" target, and it only depends on the "migrator" oneshot service unit since it is the only service required (for now) to determine if a boot was successful. The "mark-successful-boot" service unit used to send metrics after it was executed. However, doing so indirectly declared a strong dependency on "settings-applier" since the unit was setup to use the "proxy.env" file. The "metricdog-successful-boot" oneshot service unit was created to remove such dependency from "mark-successful-boot".

Preconfigured

This target is used to start the boot process (a.k.a default target), it is based on the "multi-user" target provided by systemd so that other basic systemd units are started during this stage. Failures in the required "oneshot" service units will cause the target to fail, stopping the boot process. The boot is marked as successful during the execution of this target, only if the migrator "oneshot" service exits successfully. Services initialized during this phase include:

acpid
chrony
dbus-broker
audit-rules
migrator (required)
metricdog-successful-boot
storewolf (required)
early-boot-config (required)
sundog (required)
settings-applier(required)

Once the target is reached (completed), the "activate-configured.service" unit will set the configured target as the default target and start it.

Configured

This target should be used to setup additional configurations in the host before services like kubernetes/ecs start. Services initialized during this phase include:

chronyd
host-containerd

Once the target is reached (completed) the "activate-multi-user.service" unit will set the multi-user target as the default target, and start it.

Multi-user

This is the final target enabled in the boot sequence. Services initialized during this phase include:

host-containers@*
docker
ecs
kubernetes

Testing done:

systemctl status didn't show any failures on successful boots
Run nginx pod/task/container
Admin/Control containers were enabled once the multi-user target was reached
Custom host container sent as user data was enabled once the multi-user target was reached
Custom host container created after boot was enabled after the new settings were applied

apiclient set --json \
    '{"host-containers": { "test": { "source": "docker.io/arnaldo2792/blocker:latest", "superpowered": false, "enabled": true}}}'

ps ax -o pid,args | grep host-ctr
479 /usr/bin/host-ctr run --container-id=admin --source=328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-admin:v0.6.0 --superpowered=true
480 /usr/bin/host-ctr run --container-id=control --source=328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-control:v0.4.2 --superpowered=false
481 /usr/bin/host-ctr run --container-id=custom-at-boot --source=docker.io/arnaldo2792/blocker:latest --superpowered=false
16546 /usr/bin/host-ctr run --container-id=test --source=docker.io/arnaldo2792/blocker:latest --superpowered=false

Failed boot with bad configurations

[settings.host-containers.admin]
enabled = true

[settings.bad]
bad = "data"

[   11.365797] early-boot-config[366]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 400 when PATCHing /settings?tx=bottlerocket-launch: Json deserialize error: unknown field `bad`, expected one of `source`, `enabled`, `superpowered`, `user-data` at line 1 column 85
[FAILED] Failed to start Bottlerocket userdata configuration system.
See 'systemctl status early-boot-config.service' for details.
[DEPEND] Dependency failed for User-specified setting generators.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Bottlerocket dynamic pre-configuration.
[DEPEND] Dependency failed for Isolates configured.target.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

packages/chrony/chronyd.service

packages/libaudit/audit-rules.service

packages/os/[email protected]

packages/os/migrator.service

packages/os/sundog.service

sources/api/host-containers/src/main.rs

packages/docker-engine/docker.service

packages/release/preconfigured.target

packages/os/settings-applier.service

bcressey · 2021-03-26T23:06:32Z

Re: preconfigured section - apiserver is not a one-shot.

One question is whether systemctl restart apiserver or a stop & start causes the system to fail catastrophically because all the one-shots in "preconfigured" require it. If so, we might want to relax those dependencies to "Wants" instead.

packages/chrony/chronyd.service

packages/os/metricdog-successful-boot.service

packages/os/mark-successful-boot.service

bcressey

Nice! Just a handful of wording and consistency fixes.

packages/os/early-boot-config.service

packages/os/send-boot-success.service

packages/os/early-boot-config.service

packages/os/os.spec

packages/os/send-boot-success.service

packages/os/sundog.service

packages/release/preconfigured.target

packages/release/configured.target

arnaldo2792 · 2021-03-30T21:59:31Z

In the last forced pushed I addressed the comments left by @bcressey:

Typo in stoping in multiple units
Fixed os.spec to include "send-boot-success.service"
Fixed "send-boot-success.service" dependencies
Fixed "sundog.service" dependencies
Fixed "preconfigured.target" and "configured.target" descriptions

packages/os/early-boot-config.service

arnaldo2792 · 2021-03-30T23:13:23Z

In the last forced pushed I addressed the comments left by @bcressey:

Typo in "early-boot-config.service"
Improved documentation on why we switched from Requires to Wants in "early-boot-config.service"
Improved documentation on why we added the new settings in the "oneshot" units

bcressey · 2021-03-30T23:17:30Z

I'm happy with the code changes, but I'm planning to pull down the branch and try some contrived scenarios to see if the system behaves as expected.

arnaldo2792 · 2021-04-01T20:55:34Z

Updated commit message
Added comments in mark-successful-boot and send-boot-success

arnaldo2792 · 2021-04-01T21:49:32Z

Rebase upstream
Add Requires=storewolf.service in early-boot-config.service

packages/os/early-boot-config.service

arnaldo2792 · 2021-04-02T00:10:16Z

Added storewolf.service in After for early-boot-config

webern

/usr/bin/metricdog send-boot-success will return zero even if the network is not ready. If that is the case then you would see an error in the journal. If it succeeds there would be no output in the journal unless you run it like this /usr/bin/metricdog --log-level debug send-boot-success. Other than making sure send-boot-success works, looks good to me. Nice work!

sources/api/host-containers/src/main.rs

arnaldo2792 · 2021-04-02T16:24:54Z

Added --no-block to the enable_and_start command
Added --log-level debug to send-boot-success

The current systemd boot sequence is error prone on isolated scenarios like sending invalid configurations through user data. These scenarios could cause inconsistent states at the end of the boot process, which affect the mechanism to determine if a boot was successful after applying new configurations or updates. It is also difficult to implement features that require service initialization order. In order to fix the problems presented above, this commit defines a new boot sequence, with three main systemd targets: preconfigured, configured and multi-user. Preconfigured This target is used to start the boot process. Failures in any of the required service units will cause the target to fail, stopping the boot process. The boot is marked as successful during the execution of this target only if the migrator "oneshot" service exists successfully. Services initialized during this phase include: * migrator * mark-successful-boot * send-boot-success * storewolf * apiserver * early-boot-config * sundog * settings-applier Once the target is reached (completed), the "activate-configured.service" unit will set the configured target as the default target and start it. Configured This target should be used to setup additional configurations in the host before services like kubernetes/ecs start. Services initialized during this phase include: * chronyd * host-containerd Once the target is reached (completed) the "activate-multi-user.service" unit will set the multi-user target as the default target, and start it. Multi-user This is the final target enabled in the boot sequence. Services initialized during this phase include: * host-containers@* * docker * ecs * kubernetes

arnaldo2792 · 2021-04-02T22:27:09Z

@webern Removed --log-level debug from send-boot-success since we don't want the entire debug logs here, I verified that send-boot-success works.

sources/api/host-containers/src/main.rs

packages/release/activate-configured.service

jahkeup

I'm okay moving forward with this - though I'd like us to use another approach (if that's possible with what systemd provides) if we run into significant issues with systemctl isolate down the road.

arnaldo2792 requested review from bcressey, samuelkarp, zmrow and tjkirch March 26, 2021 03:51

bcressey requested changes Mar 26, 2021

View reviewed changes

arnaldo2792 force-pushed the new-boot-sequence branch from e4e4624 to ed170a3 Compare March 30, 2021 02:12

bcressey reviewed Mar 30, 2021

View reviewed changes

arnaldo2792 force-pushed the new-boot-sequence branch from ed170a3 to 9e0ed5e Compare March 30, 2021 19:04

bcressey self-requested a review March 30, 2021 19:53

arnaldo2792 force-pushed the new-boot-sequence branch from 9e0ed5e to 3ef4fa2 Compare March 30, 2021 20:36

bcressey reviewed Mar 30, 2021

View reviewed changes

arnaldo2792 force-pushed the new-boot-sequence branch from 3ef4fa2 to 1834d0e Compare March 30, 2021 21:47

arnaldo2792 requested a review from bcressey March 30, 2021 22:02

bcressey reviewed Mar 30, 2021

View reviewed changes

packages/os/early-boot-config.service Outdated Show resolved Hide resolved

packages/os/early-boot-config.service Outdated Show resolved Hide resolved

arnaldo2792 force-pushed the new-boot-sequence branch from 1834d0e to 104a38a Compare March 30, 2021 23:11

arnaldo2792 requested a review from bcressey March 30, 2021 23:15

arnaldo2792 force-pushed the new-boot-sequence branch from 104a38a to fd4e0d6 Compare April 1, 2021 20:53

arnaldo2792 force-pushed the new-boot-sequence branch from fd4e0d6 to 95c752a Compare April 1, 2021 21:47

arnaldo2792 mentioned this pull request Apr 1, 2021

Add support for bootstrap containers via API settings #1387

Merged

bcressey approved these changes Apr 1, 2021

View reviewed changes

packages/os/early-boot-config.service Outdated Show resolved Hide resolved

arnaldo2792 removed the request for review from zmrow April 2, 2021 00:01

arnaldo2792 force-pushed the new-boot-sequence branch from 95c752a to cad9380 Compare April 2, 2021 00:09

arnaldo2792 removed the request for review from samuelkarp April 2, 2021 01:28

arnaldo2792 requested review from jahkeup and webern and removed request for tjkirch April 2, 2021 01:28

webern approved these changes Apr 2, 2021

View reviewed changes

sources/api/host-containers/src/main.rs Outdated Show resolved Hide resolved

arnaldo2792 force-pushed the new-boot-sequence branch from cad9380 to 3874edb Compare April 2, 2021 16:22

arnaldo2792 force-pushed the new-boot-sequence branch from 3874edb to 0e74737 Compare April 2, 2021 22:25

jahkeup reviewed Apr 5, 2021

View reviewed changes

sources/api/host-containers/src/main.rs Show resolved Hide resolved

packages/release/activate-configured.service Show resolved Hide resolved

jahkeup approved these changes Apr 5, 2021

View reviewed changes

arnaldo2792 merged commit fd897d4 into bottlerocket-os:develop Apr 5, 2021

arnaldo2792 mentioned this pull request Sep 7, 2023

Add documentation for systemd units architecture bottlerocket-os/bottlerocket-project-website#260

Open

This was referenced Apr 5, 2021

Add aws-k8s-1.20 variant with Kubernetes 1.20 support #1437

Merged

Package updates #1460

Merged

arnaldo2792 deleted the new-boot-sequence branch April 15, 2021 17:17

arnaldo2792 mentioned this pull request Aug 24, 2021

Using Wants not Requires on configured.target #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

packages: change systemd boot sequence #1423

packages: change systemd boot sequence #1423

arnaldo2792 commented Mar 26, 2021 •

edited

Loading

bcressey commented Mar 26, 2021

bcressey left a comment

arnaldo2792 commented Mar 30, 2021

arnaldo2792 commented Mar 30, 2021 •

edited

Loading

bcressey commented Mar 30, 2021

arnaldo2792 commented Apr 1, 2021

arnaldo2792 commented Apr 1, 2021

arnaldo2792 commented Apr 2, 2021

webern left a comment

arnaldo2792 commented Apr 2, 2021

arnaldo2792 commented Apr 2, 2021

jahkeup left a comment

packages: change systemd boot sequence #1423

packages: change systemd boot sequence #1423

Conversation

arnaldo2792 commented Mar 26, 2021 • edited Loading

Preconfigured

Configured

Multi-user

bcressey commented Mar 26, 2021

bcressey left a comment

Choose a reason for hiding this comment

arnaldo2792 commented Mar 30, 2021

arnaldo2792 commented Mar 30, 2021 • edited Loading

bcressey commented Mar 30, 2021

arnaldo2792 commented Apr 1, 2021

arnaldo2792 commented Apr 1, 2021

arnaldo2792 commented Apr 2, 2021

webern left a comment

Choose a reason for hiding this comment

arnaldo2792 commented Apr 2, 2021

arnaldo2792 commented Apr 2, 2021

jahkeup left a comment

Choose a reason for hiding this comment

arnaldo2792 commented Mar 26, 2021 •

edited

Loading

arnaldo2792 commented Mar 30, 2021 •

edited

Loading