-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: invalidate SSM cache upon AMI deprecation #7301
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to
|
Pull Request Test Coverage Report for Build 11621534000Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
93bd2cd
to
5bc8b31
Compare
5bc8b31
to
ace5bc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Muy bonita 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Co-authored-by: Jonathan Innis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Co-authored-by: Jonathan Innis <[email protected]>
Co-authored-by: Jonathan Innis <[email protected]>
Fixes #N/A
Description
When an EKS-optimized AMI is deprecated, and a user is using a
family@latest
alias, Karpenter will continue to launch with that AMI until the 24 hour SSM cache entry has expired. This cache ensures Karpenter doesn't cause a thundering herd by upgrading all users within a region simeltaneously upon a SSM parameter rollout. However, Karpenter should respond faster in the event of an EKS-optimized AMI deprecation.This PR introduces a mechanism to invalidate the SSM cache upon detection of a deprecated EKS-optimized AMI. This cache invalidation is still staggered over 30 minutes to reduce the risk of a thundering herd, but this is still up to 48x the reaction time Karpenter previously had. Additionally, thanks to the 24 hour cache, only a subset of users using
family@latest
will have been upgraded to the deprecated AMI in the first place, further reducing the chances of a thundering herd upon rollback.Some assumptions made:
latest
SSM parameter for EKS optimized AMIs should be rolled back before AMI deprecation. If this is not the case, Karpenter will continuously invalidate the SSM cache until the parameter points to a non-deprecated AMI.family@latest
alias (and non-blocking drift budgets / PDBs) to have been upgraded to the now-deprecated AMI.How was this change tested?
make test
(additional tests in-progress)Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.