add fixes to improve boot speed #1809

bcressey · 2021-11-09T02:54:50Z

Issue number:
N/A

Description of changes:
This is a collection of fixes to improve boot speed and time to a usable node - at least by 5 seconds, at most by 8 seconds.

Building kubelet with the "dockerless" tag saves 5 seconds during service startup, as otherwise cadvisor tries for five seconds to connect to the Docker daemon before printing an error.

The fix for the defer timeout in the wicked DHCPv6 client saves 1 second for around half of launches, in cases where the timer fires a little early and would otherwise trigger another 1 second wait.

Using an overlayfs for the CNI plugin directory saves a variable amount of time by avoiding a potentially slow copy to an unwritten EBS volume in the critical path. systemd-tmpfiles-setup previously took 900 milliseconds or more in most cases, and now takes 100 milliseconds or less, with most of the remaining time spent populating the SELinux modules in /var/lib/selinux.

Building support for the PS/2 controller, keyboard, and mouse as modules saves around 400 milliseconds during boot under KVM, as otherwise device mapper waits for the configuration to finish before mounting the root filesystem. They are still loaded later, after the root filesystem is mounted, but at that point we can do more work in parallel.

Disabling RAID auto-detect avoids another potential device wait and reduces printk messages. Writing to the console device at 115200 bits per second speeds up those operations by 12x. Console logging continues to be a drag on overall boot speed. We can turn it off altogether to gain at least 2 seconds, but only at a severe cost to debugging capabilities if anything goes wrong. Using the higher device speed obviously helps, but its impact is spread across all threads that might draw the short straw after triggering a printk call, and is difficult to quantify.

Removing the udevadm settle dependency doesn't yield a measurable improvement in boot speed, but does stop systemd from blaming wicked for slowing everything down.

I've kept the two commits that added debug output for systemd-tmpfiles and the wicked clients, since these were instrumental in identifying the underlying issues and confirming the fixes. These logs are all sent to the journal rather than the console, so they don't compete with existing output or slow down the boot.

Testing done:
For the kernel change: verified that the keyboard and mouse modules were still loaded on x86_64 nodes.

For the changes to kubelet and the CNI plugins directory: verified that sonobuoy runs passed for these versions, and that no Docker related error messages were logged to the journal.

For the "activate" targets: confirmed that these were no longer blamed by systemd-analyze blame, and that bootstrap containers still worked as expected.

For the wicked changes: confirmed that the DHCP6 client would defer after the first timeout, whether the timer fired slightly before or slightly after one second elapsed. On instances with DHCP6 enabled, the lease was successfully acquired.

For the udev settle change: used a hacked up local build where wicked was set up to manage "eth1" rather than "eth0", and verified that wicked would still configure the device if I renamed it into existence during the wait.

For the serial console changes: verified that console logs were present for AWS variants across a range of instance types - c1.xlarge, t2.large, m3.2xlarge, c3.large, c4.large, c5.large, c6g.large - and for VMware variants running on ESXi 7.0. Note that we're already using 115200 for GRUB as of #1701, so this setting has previously been validated on a smaller set of instance types.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

samuelkarp

One super minor suggestion, but otherwise LGTM!

samuelkarp · 2021-11-09T04:18:32Z

packages/release/opt-cni-bin.mount

+What=overlay
+Where=/opt/cni/bin
+Type=overlay
+Options=noatime,nosuid,nodev,lowerdir=/usr/libexec/cni/bin,upperdir=/opt/cni/upper,workdir=/opt/cni/work,context=system_u:object_r:local_t:s0


Would it be worth adding another directory here to contain the upperdir and workdir to hide them/make it more obvious that they're an implementation detail of the overlay mount? Something like /opt/cni/.overlay/upper and /opt/cni/.overlay/work?

I opted to use /var/lib/cni-plugins for the overlay directories, partly to match the treatment of /var/lib/kernel-devel (which I also adjusted along these lines), and partly to guard against cases where pods might be mounting in /opt/cni and get confused by the new directories.

bcressey · 2021-11-09T21:45:49Z

Rebase; fix the serial console speed commit to account for the removed aws-k8s-1.17 variant.

If they're built in, they can delay mounting the root filesystem. Signed-off-by: Ben Cressey <[email protected]>

This disables most of the Docker-related functionality, and avoids a five second delay at startup waiting for the Docker daemon. Signed-off-by: Ben Cressey <[email protected]>

We use tmpfiles extensively, and the additional output gives a more complete picture of what happens each boot. Signed-off-by: Ben Cressey <[email protected]>

Move the upper, lower, and work directories for the writable kernel development tree into a subdirectory, to better indicate their status as an implementation detail for the overlayfs mount. Signed-off-by: Ben Cressey <[email protected]>

This speeds up boot by avoiding the need to copy the binaries to the local storage volume. Signed-off-by: Ben Cressey <[email protected]>

Otherwise these units show up as some of the longest running jobs in `systemd-analyze blame` output. Signed-off-by: Ben Cressey <[email protected]>

Signed-off-by: Ben Cressey <[email protected]>

The wicked daemons will wait for expected devices to appear, which is more reliable than relying on `udevadm settle` and avoids unnecessary boot delays. Signed-off-by: Ben Cressey <[email protected]>

We use a one second defer timeout for the DHCPv6 lease essentially to mark it as optional and minimize the boot delay. One second is longer than we would like already, but going sub-second is somewhat invasive because the timeouts are tied to the protocol implementation and can change the client behavior. It's relatively simple to avoid the extra wait caused by an early timer event. Signed-off-by: Ben Cressey <[email protected]>

Existing variant platforms all support the 115200 speed for the guest serial device. Signed-off-by: Ben Cressey <[email protected]>

Any use of RAID is left up to containers to handle. Signed-off-by: Ben Cressey <[email protected]>

bcressey · 2021-11-09T22:31:34Z

Adjust overlay directory handling per @samuelkarp

samuelkarp

LGTM

bcressey requested review from tjkirch and arnaldo2792 November 9, 2021 02:54

samuelkarp approved these changes Nov 9, 2021

View reviewed changes

tjkirch approved these changes Nov 9, 2021

View reviewed changes

bcressey force-pushed the faster-boot branch from 17d6153 to 89ce2c3 Compare November 9, 2021 21:45

bcressey added 11 commits November 9, 2021 21:46

kernel: load i8042, keyboard, and mouse as modules

c405816

If they're built in, they can delay mounting the root filesystem. Signed-off-by: Ben Cressey <[email protected]>

kubelet: build with dockerless tag if supported

7dfc910

This disables most of the Docker-related functionality, and avoids a five second delay at startup waiting for the Docker daemon. Signed-off-by: Ben Cressey <[email protected]>

release: increase logging for systemd-tmpfiles

cc1f34e

We use tmpfiles extensively, and the additional output gives a more complete picture of what happens each boot. Signed-off-by: Ben Cressey <[email protected]>

release: adjust overlay dirs for kernel sources

e1676d0

Move the upper, lower, and work directories for the writable kernel development tree into a subdirectory, to better indicate their status as an implementation detail for the overlayfs mount. Signed-off-by: Ben Cressey <[email protected]>

release: use overlayfs for CNI plugin directory

0b0fec1

This speeds up boot by avoiding the need to copy the binaries to the local storage volume. Signed-off-by: Ben Cressey <[email protected]>

release: do not block when setting systemd targets

e96f6e1

Otherwise these units show up as some of the longest running jobs in `systemd-analyze blame` output. Signed-off-by: Ben Cressey <[email protected]>

wicked: increase logging for dhcp4 and dhcp6

12996bf

Signed-off-by: Ben Cressey <[email protected]>

wicked: drop systemd-udev-settle dependency

2f08c59

The wicked daemons will wait for expected devices to appear, which is more reliable than relying on `udevadm settle` and avoids unnecessary boot delays. Signed-off-by: Ben Cressey <[email protected]>

build: use faster serial console

65daf91

Existing variant platforms all support the 115200 speed for the guest serial device. Signed-off-by: Ben Cressey <[email protected]>

build: disable RAID auto detect

a1f0243

Any use of RAID is left up to containers to handle. Signed-off-by: Ben Cressey <[email protected]>

bcressey force-pushed the faster-boot branch from 89ce2c3 to a1f0243 Compare November 9, 2021 22:29

tjkirch approved these changes Nov 9, 2021

View reviewed changes

arnaldo2792 approved these changes Nov 10, 2021

View reviewed changes

samuelkarp approved these changes Nov 10, 2021

View reviewed changes

bcressey merged commit cb728c4 into bottlerocket-os:develop Nov 10, 2021

bcressey deleted the faster-boot branch November 10, 2021 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fixes to improve boot speed #1809

add fixes to improve boot speed #1809

bcressey commented Nov 9, 2021

samuelkarp left a comment

samuelkarp Nov 9, 2021

bcressey Nov 9, 2021

bcressey commented Nov 9, 2021

bcressey commented Nov 9, 2021

samuelkarp left a comment

add fixes to improve boot speed #1809

add fixes to improve boot speed #1809

Conversation

bcressey commented Nov 9, 2021

samuelkarp left a comment

Choose a reason for hiding this comment

samuelkarp Nov 9, 2021

Choose a reason for hiding this comment

bcressey Nov 9, 2021

Choose a reason for hiding this comment

bcressey commented Nov 9, 2021

bcressey commented Nov 9, 2021

samuelkarp left a comment

Choose a reason for hiding this comment