-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NFD Master memory leak #1614
Comments
Thanks for the report @tatodorov , @marquiz is looking into it. we will provide updates as soon as possible |
Thanks @tatodorov for reporting this (with detailed description). On a quick analysis/testing #1615b should fix this. This probably is not commonly encountered as people don't see frequent nfd-master config file updates (which hides the problem). |
Hi @tatodorov thanks again for reporting this issue, a fix patch has been merged into the release branch, would you help us test it in your environment before cutting a patch release? |
The image should be |
@ArangoGutierrez and @marquiz , thank you very much for your prompt reply! |
Looking forward for your overnight report @tatodorov |
Yeah. @tatodorov we're prepared to cut new patch release(s) quickly when we get good-to-go signal. Based on my own testing the issue looks like fixed |
@ArangoGutierrez, @marquiz I haven't observed the memory leak anymore. Thank you very much for your assistance! |
@tatodorov NFD v0.14.5 (and v0.15.3) containing the fix has been released. I'm closing this issue now. Please re-open (or create a new issue) if you encounter any further issues |
@marquiz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Node Feature Discovery Master continuously utilizes more memory and never release it.
I am running NVIDIA GPU Operator 23.9.2 and Node Feature Discovery 0.14.4 on Kubernetes 1.24.6.
Over a period of 1 hour I observe how the NFD master reaches 2 GB of memory.
I had to set a memory limit, since couple of times it exhausted the entire memory of the host.
Also, I configured NFD GC to run garbage collection every minute.
However, this didn't lead to release of memory.
Currently, I removed the GPU Operator, NFD Workers and NFD GC.
The memory usage of NFD Master keeps increasing.
Every minute, I can see the following in NFD Master's log:
This is the content of the ConfigMap mounted to NFD Master:
data: nfd-master.conf: |- extraLabelNs: - nvidia.com
What you expected to happen:
NFD Master to have a steady memory usage.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): 1.24.6cat /etc/os-release
): Ubuntu 22.04.3 LTS (Jammy Jellyfish)uname -a
):5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: