-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottlerocket under-reports Ephemeral Storage Capacity #2743
Comments
@jonathan-innis, thanks for reaching out! We're taking a deeper look into this. |
@jpculp Is there any progress or updates on this issue? |
Unfortunately not yet. We have to take a deeper look at the interaction between the host, containerd, and cAdvisor. Out of curiosity, do you see the same behavior with |
I haven't taken a look at the newer version on K8s 1.24. Let me take a look on a newer version of K8s and get back to you on that. |
Hi @jonathan-innis, although I haven't fully root-caused the issue. I wanted to provide an update to offer some information. I took a deeper look into this and it seems like the issue is stemming from When After the node becomes ready, if I query the metrics endpoint, both cadvisor stats and node summary stats are reporting correctly: cAdvisor:
Node summary:
But for some reason, the node object in the cluster does not reflect that in the K8s API:
Only reports ~988 GB What's interesting is that once you either reboot the worker node or restart the kubelet service, the stats sync up correctly:
So it seems like If you want to work around this issue, you can reboot the nodes or restart kubelet to get |
Wondering if you are still seeing this behavior. If so, do the stats eventually correct themselves, or once it's in this state does it keep reporting the wrong size indefinitely. There's a 10 second cache timeout for stats, so I wonder if we are hitting a case where the data in the cache needs to be invalidated before it actually checks again and gets the full storage space. |
I didn't realized I needed to specify the device as |
Still seeing this behavior. EKS 1.25. Entering admin container > sudo sheltie > |
FWIW, recently upgraded to 1.26, and the behavior is there as well |
Hi @James-Quigley @jonathan-innis, I suspect this issue might be addressed by changes to include monitoring of the container runtime cgroup by kubelet #3804. Are you still seeing this issue on versions of Bottlerocket >= 1.19.5? |
Image I'm using:
AMI Name:
bottlerocket-aws-k8s-1.22-aarch64-v1.11.1-104f8e0f
What I expected to happen:
I expected that the capacity on my worker node would be approximately close to what the actual EBS volume size is to the
xvdb
mount for my node filesystem.What actually happened:
CAdvisor or something in the BR image appears to be under-reporting the amount of capacity that I have on this worker node.
The
ephemeral-storage
capacity here is approximately1457383148Ki ~= 1.35 Ti
which is not close to the ~4.3T that thelsblk
is reporting.How to reproduce the problem:
/dev/xvdb
mountkubectl get node
orkubectl describe node
The text was updated successfully, but these errors were encountered: