-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for kubelet config options: imageGCHighThresholdPercent, imageGCLowThresholdPercent #2065
Comments
Thanks for the report! We'll consider surfacing those options. Could nodes be launched with a slightly larger data volume to avoid running so close to the thresholds? |
Hey, thanks for the fast reply! Yes this would also certainly be an option, we will consider it, thanks! But in the end it is just a workaround for the underlying issue. Also if you are running hundreds of nodes, storage costs will certainly increase (especially if you run a lot of smaller nodes). Would be happy to see this an option in the userdata config. Also happy to make the PR myself if you can guide me to the right parts of the code |
Certainly! It looks like we haven't added more of these settings in a while, but #1659 has the basic structure:
On that last point: migrations are frankly just painful and we're currently iterating on approaches to make them either unnecessary or more bearable. The problem they're solving is that new releases of Bottlerocket will have your changes and know about the new settings, but if someone upgrades to that new release and later downgrades to an older release, then the older release will not understand the new settings and will choke when it encounters them. This behavior is by design, to avoid the similar case of accidents where a security- or performance-critical setting contains a typo and gets ignored rather than applied, but it's also very unintuitive. To work around this, whenever we add new settings, we create migration binaries that remove those settings on downgrade. There are helper macros and build system integrations to make this less of a chore, but it's still not intuitive. Up to you whether you want to go down this rabbit hole in your PR; if you'd prefer to ignore it, we're still delighted to have the contribution and can address it later during release prep. |
@DZDomi - let us know if you have any questions about the above! We're happy to help with a PR if you think you'd like to try putting one together. |
What I'd like:
We were running into an issue on our development EKS cluster when we tried to update multiple deployments trough Argo CD on the same EKS node. The node was hitting the Condition
DiskPressure
for one to two minutes on each parallel deployment, before it reverted back to reportingNoDiskPressure
. We tried to track the issue down and could identify the issue with the following timeline:DiskPressure
not scheduling any new nodesNoDiskPressure
This garbage collect interval could be changed via the kubelet configuration arguments. Unfortunately currently this is not supported by Bottlerocket. The following options should be able to configure via bootstrap arguments:
AWS itself allows this option on their EKS optimised instances: https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-image-cache/
Any alternatives you've considered:
We had to decrease the hard eviction limits from the defaults to lower values in order to not run into this condition:
This is not really optimal, since it makes the whole node more likely to get close to the available disk space.
The text was updated successfully, but these errors were encountered: