-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label BOTTLEROCKET-DATA
at runtime
#2807
Conversation
Push above addresses @bcressey 's comments. Changes a approach a bit. We don't append the fallback data partition if the image layout is set to "unified" since it's redundant. Update PR description with new testing. |
Push removes
This way it doesn't look like the boot is hanging when the oneshots are waiting for their data partitions. |
BOTTLEROCKET-DATA
and create /local
fs at runtime
Pushes above addresses @bcressey's comments. Also adds a commit with the changes described in #2807 (comment) that moves the filesystem creation for /local from image build time to host runtime. Re-did the tests as described in the PR description and the results are still as expected with the only difference being the new
|
Push above renames |
Push above adds a comment to explain the wait in `label-data-alternative'. Also adds a dependency constraint between |
Added boot performance impact information to the PR description. |
BOTTLEROCKET-DATA
and create /local
fs at runtimeBOTTLEROCKET-DATA
at runtime
Push above rebases onto develop. edit: Oops, I did push all my changes along with the rebase. |
Pushes above (including the rebase push) addresses @bcressey 's comments.
Testing:Split partition/ normal setup:
DATA-A fails to grow as expected because there's not enough free space on the root volume. DATA-B successfully labels and grows:
repart-local grows
On reboot, the
lsblk:
Unified:
lsblk:
Split but with data volume missing, root volume increased to 12GiB:
lsblk:
blkid:
|
Push above fixes commit message |
Push above fixes some inaccurate comments. |
Push above is just a rebase to get github actions going again. |
Push above rebases onto develop and resolves conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just one last minor cleanup suggestion.
Changes partitioning to always keep a data partition on the os image. We grow and label data partition at runtime. We will still create a separate data partiion on the second volume/disk for "split" partition layouts. We add two new services that run as part of 'local-fs.target' and before 'local.mount' and 'repart-local.service': * 'label-data-a.service' * 'label-data-b.service' 'label-data-a' and 'label-data-b' will "compete" and both try to label 'BOTTLEROCKET-DATA' first with the partition they're each waiting for. 'label-data-a' waits for the data partition that resides on the OS disk image. Once that device is ready, we call 'systemd-repart' to relabel it as 'BOTTLEROCKET-DATA' and grow it as much as possible. 'label-data-b' calls 'systemd-repart' to label the data partition on the data image as 'BOTTLEROCKET-DATA' and grow the partition to fill the remainder of the disk. All of this lets the host to boot if data partition on the data image doesn't exist and the root filesystem disk has leftover extra space to accommodate a reasonably-sized back-up data partition.
'mark-successful-boot' now also updates BOTTLEROCKET_PRIVATE's GPT table attribute to indicate whether boot has ever succeeded before. Add a new 'has-boot-ever-succeeded' subcommand for checking if boot has ever succeeded before
This adds additional safeguards to prevent a different data partition from being labeled and resized if the host has ever booted successfully in the past with an original data partition.
Push above removes the redundant repart conf and now just keeps the single Tested things and they still work as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Issue number:
Resolves #2822
Description of changes:
Testing done:
Instance with the
aws-k8s-1.24
AMI as is without modification.The host comes up as expected and system logs doesn't show any abnormalities:
The status
label-data-preferred
,label-data-alternative
,repart-local
shows the data partition gets labelled and the local fs gets created and grows successfully.blkid
shows the expected new partition UUIDs for the data partitionslsblk
looks normalUpon reboot,
label-data-*
oneshot service units do not run since a partition labelledBOTTLEROCKET-DATA
already exists:Instance with the
aws-k8s-1.24
AMI but with the default 20 GB data EBS volume removed. The single root EBS volume size increased from 2GB to 12GBThe host boots and comes up even though it's missing the data EBS volume:
e (dead)
blkid
shows the expected static partition UUIDs and no second NVMe disklsblk
looks normal.AMI with "unified" image layout
The host comes up as fine. There is no change to the image layout from before this change.
The status
label-data-preferred
,label-data-alternative
,repart-local
shows the data partition gets labelled and the local fs gets created and grows successfully:blkid
looks fine:lsblk
looks normalSingle EBS volume with default 2 GB size and no additional space
The host fails boot as expected during
label-data-alt
since there is not enough space to label and grow the alternative data partition.Testing at scale
Launched 1000 instances with "split" layout AMI with both EBS attached. All hosts come up and join the cluster
Launched 1500 instance with split image AMI with a single 12 GB EBS volume. All hosts comes up and joins the cluster.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.