-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel GP during swss/syncd/teamd shutdown #6103
Comments
This issue is not easily repeatable. I only see once in available history. |
(gdb) bt |
(gdb) f 4 |
A few things are interesting:
Looking up an empty map should yield nothing instead of segment faulting. I suspect this is a multi-thread issue where a thread made a query while the global variable m_db_info is changing. I did some research, it is believed that STL is mutl-thread safe only for reads. But not for writes. |
Fixes: sonic-net/sonic-buildimage#6103 This PR is to fix the Kernel GP errors that are seen in any short-lived process within swss. Issue: As part of *syncd initialization, Logger::linkToDbNative function is called and a thread is started. When the main *syncd process terminates the destructor Logger::~Logger simply detaches the SettingThread. The exiting main process deletes the static variables. Fault is hit when the detached thread (still executing in the infinite loop) tries to access these freed up variables. Fix: Before exiting the main process, set the flag in Logger destructor to signal the detached thread that the main process is finishing up. Instead of detaching the thread (which leaves this thread access the undefined static variables), join the SettingThread thread.
Description
Steps to reproduce the issue:
Describe the results you received:
Test found following error in syslog, the system recovered from the test, GP happened during service shutdown. But this issue should be addressed anyways.
INFO kernel: [34985.337998] traps: gearsyncd[16050] general protection ip:7fe030bb1aea sp:7fe03062f9b0 error:0 in libswsscommon.so.0.0.0[7fe030b95000+51000]\n
Describe the results you expected:
test pass
Additional information you deem important (e.g. issue happens only occasionally):
SONiC Software Version: SONiC.master.507-7f21c0be
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: 7f21c0b
Build date: Sat Nov 28 05:20:26 UTC 2020
Built by: johnar@jenkins-worker-8
```
The text was updated successfully, but these errors were encountered: