-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad server doesn't accept connections after some time #8038
Comments
having a similar issue, happens randomly after some days and one cpu core goes up to 100% |
@schmichael any chances this behaviour will be fixed in 0.11.3? |
After sending SIGABRT (09:03:54 in the log) to unresponsive process, I got this log: |
Can't reproduce on 0.11.3. See #8163 (comment) |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.11.2 (807cfeb)
Operating system and Environment details
CentOS 7.8
Issue
We've got a nomad cluster with 3 server nodes. Each several days one of the nomad servers stops receiving any connections:
And others mark that node as left:
I tried to trace the broken nomad process with strace, but the only system calls were: epoll_pwait, nanosleep, sched_yield and futex.
Previous release (0.10.5) seems to be working fine. Nomad agents work well so far.
Reproduction steps
Set up nomad cluster and wait several days :)
Nomad Server config
Nomad Server logs
The last lines on the failed server
Corresponding logs on another server
I understand that it's probably not enough diagnostic information and I could provide more information if you let me know what could also be useful.
The text was updated successfully, but these errors were encountered: