Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncd crash in syncd::VendorSai::logSet() during docker startup #21180

Open
anamehra opened this issue Dec 14, 2024 · 7 comments
Open

syncd crash in syncd::VendorSai::logSet() during docker startup #21180

anamehra opened this issue Dec 14, 2024 · 7 comments
Labels
Awaiting Info ⌛ Triaged this issue has been triaged

Comments

@anamehra
Copy link
Contributor

Description

We have observed this syncd crash once on single asic system recentl during config reload:

(gdb) bt
#0 0x00007fce07be70ca in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x000055e218d7f837 in syncd::VendorSai::logSet(_sai_api_t, _sai_log_level_t) ()
#2 0x000055e218d53487 in syncd::Syncd::saiLoglevelNotify(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) ()
#3 0x000055e218d6d5d2 in std::_Function_handler<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >), std::_Bind<void (syncd::Syncd::(syncd::Syncd, std::_Placeholder<1>, std::_Placeholder<2>))(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> >::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&) ()
#4 0x00007fce08ccdbc9 in swss::Logger::linkToDbWithOutput(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#5 0x00007fce08ccdf4a in swss::Logger::linkToDb(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#6 0x000055e218d55422 in syncd::Syncd::setSaiApiLogLevel() ()
#7 0x000055e218d66382 in syncd::Syncd::Syncd(std::shared_ptrsairedis::SaiInterface, std::shared_ptrsyncd::CommandLineOptions, bool) ()
#8 0x000055e218d4f82d in syncd_main(int, char**) ()
#9 0x000055e218d4d98f in main ()
(gdb) thread apply all bt

Looks like the crash happened very early in bringup stage. No appearent errors seein in syslogs.

Steps to reproduce the issue:

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@tjchadaga
Copy link
Contributor

@anamehra - could you please help clarify the image version and platform on which this is seen?

@tjchadaga
Copy link
Contributor

@anamehra - please also upload the techsupport

@tjchadaga tjchadaga added Triaged this issue has been triaged Awaiting Info ⌛ labels Dec 18, 2024
@sdszhang
Copy link

admin@xxx:~$ show version

SONiC Software Version: SONiC.internal-202405-cisco-111.111616281-31df541974
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: 31df541974
Build date: Fri Jan  3 05:56:33 UTC 2025
Built by: azureuser@490b1164c000000

Platform: x86_64-88_lc0_36fh-r0
HwSKU: Cisco-88-LC0-36FH-O36
ASIC: cisco-8000
ASIC Count: 3
Serial Number: xxx
Model Number: 88-LC0-36FH

@XuChen-MSFT
Copy link
Contributor

last sonic-buildimage commit of affected image is as below:

79591e1 (2024-10-28 17:52) - Update cisco-8000.ini to ref=202311.1.0.6 (#20639)

@abdosi
Copy link
Contributor

abdosi commented Jan 22, 2025

@kcudnik : can you help here ?

@kcudnik
Copy link
Contributor

kcudnik commented Jan 22, 2025

As you can see crash is in std map on insert and rebalance tree, I'm 99% sure that this is race condition since that set is made after swss common linktodb bind notification, and just checked that logset is not protected by mutex, I will make PR to fix this

@kcudnik
Copy link
Contributor

kcudnik commented Jan 22, 2025

Do you have consistent repro of this ? Can you show other threads backlog from this dump ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Info ⌛ Triaged this issue has been triaged
Projects
Status: No status
Status: No status
Development

No branches or pull requests

6 participants