Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for kubelet config options: imageGCHighThresholdPercent, imageGCLowThresholdPercent #2065

Closed
DZDomi opened this issue Apr 12, 2022 · 4 comments · Fixed by #2219
Closed
Assignees
Labels
area/core Issues core to the OS (variant independent) area/kubernetes K8s including EKS, EKS-A, and including VMW
Milestone

Comments

@DZDomi
Copy link

DZDomi commented Apr 12, 2022

What I'd like:

We were running into an issue on our development EKS cluster when we tried to update multiple deployments trough Argo CD on the same EKS node. The node was hitting the Condition DiskPressure for one to two minutes on each parallel deployment, before it reverted back to reporting NoDiskPressure. We tried to track the issue down and could identify the issue with the following timeline:

  • New images (read: > 20 images at approx the same time) are pushed trough CI/CD to different ECR repos
  • Argo CD image updater detects these new images and changes the deployment image tags in each deployment
  • Kubernetes tries to schedule some of these pods on a specific node
  • Node is currently sitting at around 82-84% of free disk space
  • Kubelet tries to download the new images (each of them has approx. 150-200 megabyte)
  • The Kubelet garbage collector kicks in and tries to delete images, since it hit the 85% (default) threshold. It will take a few minutes to delete all of them
  • New images are downloaded in parallel, tipping the free disk space > 90% and the hard eviction threshold is met
  • Node changes state into DiskPressure not scheduling any new nodes
  • Kubelet garbage collector finishes deletion of images (after a few minutes)
  • Node changes back into NoDiskPressure

This garbage collect interval could be changed via the kubelet configuration arguments. Unfortunately currently this is not supported by Bottlerocket. The following options should be able to configure via bootstrap arguments:

imageGCHighThresholdPercent: xx
imageGCLowThresholdPercent: xx

AWS itself allows this option on their EKS optimised instances: https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-image-cache/

Any alternatives you've considered:

We had to decrease the hard eviction limits from the defaults to lower values in order to not run into this condition:

[settings.kubernetes.eviction-hard]
"nodefs.available" = "5%"
"imagefs.available" = "10%"

This is not really optimal, since it makes the whole node more likely to get close to the available disk space.

@zmrow
Copy link
Contributor

zmrow commented Apr 13, 2022

Thanks for the report! We'll consider surfacing those options.

Could nodes be launched with a slightly larger data volume to avoid running so close to the thresholds?

@DZDomi
Copy link
Author

DZDomi commented Apr 13, 2022

Hey, thanks for the fast reply! Yes this would also certainly be an option, we will consider it, thanks! But in the end it is just a workaround for the underlying issue. Also if you are running hundreds of nodes, storage costs will certainly increase (especially if you run a lot of smaller nodes).

Would be happy to see this an option in the userdata config. Also happy to make the PR myself if you can guide me to the right parts of the code

@zmrow zmrow added area/kubernetes K8s including EKS, EKS-A, and including VMW area/core Issues core to the OS (variant independent) labels Apr 13, 2022
@bcressey
Copy link
Contributor

Also happy to make the PR myself if you can guide me to the right parts of the code.

Certainly! It looks like we haven't added more of these settings in a while, but #1659 has the basic structure:

  1. changes to the model and modeled types to add the settings and any necessary validation
  2. changes to documentation to describe the new settings
  3. changes to all the relevant kubelet config templates to render the settings
  4. migrations, to ensure that the new settings are erased on downgrade

On that last point: migrations are frankly just painful and we're currently iterating on approaches to make them either unnecessary or more bearable.

The problem they're solving is that new releases of Bottlerocket will have your changes and know about the new settings, but if someone upgrades to that new release and later downgrades to an older release, then the older release will not understand the new settings and will choke when it encounters them. This behavior is by design, to avoid the similar case of accidents where a security- or performance-critical setting contains a typo and gets ignored rather than applied, but it's also very unintuitive.

To work around this, whenever we add new settings, we create migration binaries that remove those settings on downgrade. There are helper macros and build system integrations to make this less of a chore, but it's still not intuitive. Up to you whether you want to go down this rabbit hole in your PR; if you'd prefer to ignore it, we're still delighted to have the contribution and can address it later during release prep.

@zmrow
Copy link
Contributor

zmrow commented May 2, 2022

@DZDomi - let us know if you have any questions about the above! We're happy to help with a PR if you think you'd like to try putting one together.

@kdaula kdaula assigned mchaker and unassigned zmrow Jun 1, 2022
@kdaula kdaula added this to the 1.9.0 milestone Jun 1, 2022
@kdaula kdaula added this to 1.9.0 Jun 1, 2022
@kdaula kdaula modified the milestones: 1.9.0, 1.10.0 Jun 2, 2022
@kdaula kdaula removed this from 1.9.0 Jun 2, 2022
@kdaula kdaula added this to 1.9.0 Jun 2, 2022
@kdaula kdaula modified the milestones: 1.10.0, 1.9.0 Jun 2, 2022
@mchaker mchaker moved this to Todo in 1.9.0 Jun 13, 2022
@mchaker mchaker moved this from Todo to In Progress in 1.9.0 Jun 15, 2022
Repository owner moved this from In Progress to Done in 1.9.0 Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) area/kubernetes K8s including EKS, EKS-A, and including VMW
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants