-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Discover Instance Type Memory Capacity #7004
Conversation
✅ Deploy Preview for karpenter-docs-prod ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
2493708
to
6ee1b24
Compare
6975505
to
c6f9af5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks really good!
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
CC @jmdeal (I think you were the one on the call today?) There's a negligible amount of variance when measuring overhead in Ki but I was unable to find any variance between the same instance type when measuring in Mi which my PR is doing. Here's a simple script that can be used to verify: https://gist.github.com/jukie/df045af8fec68941f5d119044bf04aee
|
Signed-off-by: jukie <[email protected]>
c63c9e6
to
cda22f9
Compare
Signed-off-by: jukie <[email protected]>
Signed-off-by: jukie <[email protected]>
Signed-off-by: jukie <[email protected]>
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: jukie <[email protected]>
Signed-off-by: jukie <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
pkg/controllers/providers/instancetype/discoveredcapacitycache/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/discoveredcapacitycache/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/discoveredcapacitycache/controller.go
Outdated
Show resolved
Hide resolved
pkg/controllers/providers/instancetype/discoveredcapacitycache/suite_test.go
Outdated
Show resolved
Hide resolved
Snapshot successfully published to
|
be96dfa
to
aa3ed23
Compare
Signed-off-by: jukie <[email protected]>
Signed-off-by: jukie <[email protected]>
@jmdeal could you take another look please |
Pull Request Test Coverage Report for Build 11377237253Details
💛 - Coveralls |
Fixed the conflicts |
Any other changes needed here @jmdeal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, and I'm going to let E2Es run. Otherwise LGTM.
/karpenter snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to
|
I think the e2e's need a retry after committing your suggestion but the prior run appeared to pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to rerun, the only changes since they last ran was the comment nit and some docs changes you merged in. This LGTM 🚀.
Hi @jmdeal I saw 1.0.7 was released but this wasn't included. Will this make it into the next one? |
We weren't planning on including this in a patch release, that's typically reserved for critical bug and security fixes. The plan would be to include this in the next minor version release, |
Thanks! |
Fixes #5161
Description
This adds a new controller for managing a cache on the instancetype provider that stores the memory capacity overhead for each instance type by comparing the actual value after a Kubernetes Node gets registered with the cluster.
The cache is then implemented by NewInstanceType() when calculating memory capacity. If a cached value doesn't exist it will fall back to the existing logic of
vmMemoryOverheadPercent
.The cache is keyed by a combination of instance type name and hash of nodeClass.Status.AMIs so this should always be accurate and when an AMI update is triggered will use the existing logic of calculating against
vmMemoryOverheadPercent
to ensure safe instance creation every time.How was this change tested?
Suite tests and live environment
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.