-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Feature Discovery Master random crashes #1640
Comments
Thanks @tatodorov for reporting the issue. I'm looking into it. |
I believe #1641 should fix the issue. @tatodorov do you have possiblity to run/test a staging image if/when we merge the fix and backport it to v0.14 branch? |
@marquiz , thanks for the prompt reply! |
@tatodorov now we have a v0.14 staging image with the fix backported
Really appreciated if you could test this and report back. We can then cut new patch releases |
@marquiz , thank you very much for the update! |
Thanks @tatodorov 👍 I'd suggest let's wait for a few days. I think most of the NFD team will be having an Easter break until next Tuesday or so. So if we don't have any crashes until then, we'll cut a patch release. |
@tatodorov anything to report? |
@marquiz , I haven't observed any crashes after switching to a staging image with the fix. Thank you very much for the assistance! |
Fix merged in master and released in v0.14.6 and v0.15.4 |
@marquiz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Ocasionally, NFD Master crashes with the following error:
fatal error: concurrent map read and map write
Followed by a go routine dump.
Go routine dump
What you expected to happen:
The NFD Master not crashing.
How to reproduce it (as minimally and precisely as possible):
Unfortunately, I don't know how to reproduce it.
I have it running for 11 days, and it crashed 6 times already.
Anything else we need to know?:
From the log of NFD Master, I can see this is encountered after:
Environment:
kubectl version
): 1.24.6cat /etc/os-release
): Ubuntu 22.04.3 LTS (Jammy Jellyfish)uname -a
):5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: