-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for mounted instance-store ephemeral storage #4735
feat: add support for mounted instance-store ephemeral storage #4735
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
it would be great to have this for bottlerocket too... |
4872753
to
69cf3ae
Compare
Bottlerocket currently doesn't have native support for this (see: bottlerocket-os/bottlerocket#3060). With this change, you could still enable |
Nice work 🎉 Really cool to see progress on this! |
69cf3ae
to
5cc1de8
Compare
We are using custom AMI, our own bootstrapping script (+custom script to mount local NVMe as a ephemeral storage). We will just set "instanceStorePolicy: RAID0" and karpenter will consider local storage to calculate host ephemeral-storage. Did I understand right? |
Yep that's correct, it'll use the total size of the instance store volume(s) for ephemeral-storage capacity; only AL2 would RAID them. I wonder if it makes sense to have another policy option for using the instance store size and explicitly not trying to RAID. |
|
If you're using a custom AMI and don't need the RAID0 option explicitly passed through to the userData, then this shouldn't matter for you, since the custom AMI doesn't have any userData that's templated out by default. Seems reasonable to me to just start with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work here 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Pull Request Test Coverage Report for Build 7200427255Warning: This coverage report may be inaccurate.We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
💛 - Coveralls |
Snapshot successfully published to |
140f1d2
to
fdc8272
Compare
fdc8272
to
deceedd
Compare
@alec-rabold One thing I did notice as I was testing this out is that it seems like there is a discrepancy between what the the instance store volume reports (and therefore what the NodeClaim reports) and what the actual capacity of the node is for ephemeral storage when it comes up. Just doing some testing, this is what I got when I tried to launch a bunch of pods that requested NodeClaim allocatable:
cpu: 940m
ephemeral-storage: "52026258176"
memory: 1392Mi
pods: "8"
vpc.amazonaws.com/pod-eni: "4"
capacity:
cpu: "1"
ephemeral-storage: 59G
memory: 1835Mi
pods: "8"
vpc.amazonaws.com/pod-eni: "4" Node allocatable:
cpu: 940m
ephemeral-storage: "51969867689"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 1429916Ki
pods: "8"
capacity:
cpu: "1"
ephemeral-storage: 57556000Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 1883548Ki Any idea what might be causing the discrepancy? Looks like it's off by almost 4Gi. |
It looks like there is a difference but I think it's closer than
The For |
Oh yep. You're totally right. I just blindly assumed that the ephemeral storage was coming back in |
One comment on documentation, but otherwise this LGTM to ship 🚢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to "oci://undefined.ecr.undefined.amazonaws.com/karpenter/snapshot/karpenter:v0-b28786e3009193c4e581146eb834ae9bd53d3a95". |
… argument; remove old alpha files
b28786e
to
bb62cc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to "oci://071440425669.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:v0-4cab2f6ed8b4a9ab2768a5a7fe0662065b6cffae". |
YO. This is a big deal! Thank you all, and especially @alec-rabold, for getting this pushed through!!! 🚀 |
We were testing this feature and found a bit random behavior. For same instance-type, the allocatable capacity values of ephemeral-storage comes out different.
the correct one
We also have blockDeviceMappings config in nodeClass configuration if it helps:
what could be the possible reason for this? Although this isn't but could lead to bad decision making by karpenter if there are enough wrong ones? |
Relevant issue: #2723
Something like this would allow instance-store disks to be used for ephemeral-storage and have Karpenter aware of the capacity.
Please let me know your thoughts!