-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.13.0 possibly not resizing data disk #2915
Comments
Hi @nairb774, if you're able to reach the host and
#2879 and #2807 are the recent changes we made to how the data disk is set up. These changes allow boot to continue in cases where the data volume is intentionally not attached, and where the goal is to use the additional space on the root volume for local storage. One of the caveats about the change is that if the root volume has increased space (beyond the default 2GB) and the data volume is attached, the system will have two viable data partitions, and it is non-deterministic, undefined behavior which data partition will actually be used for local storage (where container images are stored). Therefore it's strongly advised to not increase the root volume size beyond the default unless it's guaranteed there isn't a second data volume on boot. We were planning to put out a notice along with the release and we're just about finishing up with the release. I know this is an early guess, but I think the aforementioned undefined behavior might be causing what you're seeing here. |
I've rolled things back to 1.12, but as soon as I can, I'll try to bring up a few test nodes with 1.13 and get the requested information. |
From the
(The GPT numbers are reported in sectors, where each sector is 512 bytes, so multiplying by 512 and dividing by 1024 * 1024 * 1024 gives the GiB figure.) These GPT errors are saying, in essence:
This is the failure case @etungsten mentioned, where the behavior is non-deterministic and the failed systems end up with 2 GiB of mutable local storage instead of 75 GiB. We're extremely sorry about this behavior change and the impact you experienced. I know it's not much comfort but previously the "extra" 2 GiB on the first EBS volume would have gone unused since the OS ignored that space, resulting in extra cost for EBS storage but no benefit. Adjusting the launch template or tool you're using to reduce the space allocated to the first EBS volume down to the default 2 GiB should avoid the problem and save a modest amount of money. Could you let us know what provisioning tool you're using? If this convention - allocating more space to the first EBS volume - is very common or the default behavior in some tool, then the impact here could be quite widespread. |
Do we know when this will be picked up by bottle rocket updater? Do I need to worry tonight? |
Yea, I can confirm that we had, inadvertently, set the root to 4GiB in size. We are creating a launch template using Terraform, and the 4GiB was a carryover from when we had migrated from the "default" EKS managed node image. For some reason I swear you needed to specify the Its late in the evening for me to be able to poke around at this without the possibility of getting someone paged or able to review PRs for the needed system changes, but the given explanation seems quite plausible at this point - between what looks to be extraneous configuration on my part, and the fact that this event did look to only impact some but not all nodes. On the up side, for our clusters at least, the impact was just a few sad pods, and a few extra evictions. I mostly wanted to get this issue out quick since the release was quite fresh. I'll try to report back tomorrow AM PT with more information if it is still needed at that time. |
Fairly certain we have a 10GB root disk -- I don't quite remember the reason. @bcressey will this be released to be picked up by the updater tonight? |
Karpenter picked up the update and we were bit by this. Our root volumes were sized at 4gb instead of 2gb. I think this was a carry over from the example using 4 here https://github.com/aws/karpenter/blob/main/examples/provisioner/bottlerocket.yaml#L26. We've updated that to 2gb now. |
See bottlerocket-os/bottlerocket#2915 for why this is needed
@misterek it would be picked up by the updater tonight. However, for an in-place update, this guard in label-data-a.service and label-data-b.service should suppress the new behavior:
Since the |
Ugh. Ok. Thanks. I've deleted that for now. That was handy but seems like it's too risky to use now. |
@dschaaff nice catch! We're rolling back the SSM parameter change for 1.13.0 to try to stop the bleeding for anyone else using the Karpenter default values. |
With the SSM parameters rolled back to v1.12.0, new launches via Karpenter, EKS MNG, and @etungsten is working on a fix to skip the resize unless the new partition would grow to at least 20 GiB, which should avoid this failure mode. Plan is to release v1.13.1 once that's ready and tested. |
the reason folks using karpenter are setting 4GiBs is to support both GPU and non GPU bottlerocket AMIs, since GPU ones are a tad bigger and would not fit in 2GiBs. |
Thanks @nairb774 for reporting this so quickly, and thanks @bcressey and @etungsten for addressing it just as quickly! |
karpenter prior changes aws/karpenter-provider-aws#2400 |
Yes, that would be ideal. But I also understand the value in providing an example of how to adjust the data volume size in the block device mapping if one desires to do so. In which case, the user has to know how much space to give the root volume for the host to boot up correctly. Fortunately in fix we're adding, if the root volume size is below 22 GiB, then BR skips the data partition creation on the root volume and there will be no ambiguity as long as the data volume is present (with at least 20GiB of space). This reverts the behavior back to how it was before for people using the default karpenter device block mapping example with the 4 GiB root volume. |
Ideally it would be possible to just specify
That ought to use the AMI default for |
@bcressey wrote:
@bcressey Can you clarify Ideally? Is this what will happen if xvdb is the only device configured? |
@FernandoMiguel wrote:
Do you happen to know if this default sizing will work? In our case, we want to adjust some of the xvda parameters, but not the size, for example: block_device_mappings {
# This is the root volume.
device_name = "/dev/xvda"
ebs {
# Let AMI use its default volume_size (2Gi); do not increase:
# https://github.com/bottlerocket-os/bottlerocket/issues/2915
volume_type = "gp3"
delete_on_termination = true
encrypted = true
}
} |
@joebowbeer i recall there a bug in karpenter this didn't work |
Image I'm using:
Bottlerocket OS 1.13.0 (aws-k8s-1.22)
ami-033592485355290ae
What I expected to happen:
The disk that is used for containers and scratch storage is expanded to use the whole EBS volume.
What actually happened:
Containers are failing to be unpacked due to out of disk, and some containers are being evicted for ephemeral storage usage. Looking in journalctl, it does look like the resize happened, so it might be related to something else going on. This "bug" did impact a number of clusters with disparate sets of workloads. Only change was updating the base AMI through machine replacement.
Logs till the point that the machine booted. Looks fairly normal - could it be something other than a failed resize?
journal.txt
How to reproduce the problem:
Upgraded from 1.12.0 to 1.13.0.
The text was updated successfully, but these errors were encountered: