Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[crmorch] orchagent crash when destructing crmorch #1991

Closed
tylerlinp opened this issue Aug 28, 2018 · 2 comments
Closed

[crmorch] orchagent crash when destructing crmorch #1991

tylerlinp opened this issue Aug 28, 2018 · 2 comments
Assignees
Labels

Comments

@tylerlinp
Copy link
Contributor

tylerlinp commented Aug 28, 2018

Description

I meet a problem that coredump happened in rebooting system.

I think it should be a mistake m_timer(class CrmOrch) using smart pointer.
The SelectableTimer object m_timer pointed is referenced by ExecutableTimer using raw pointer(Executor::m_selectable).
In destructing CrmOrch, m_timer destructed first(had release SelectableTimer object memory), and then base class member m_consumerMap destructed, so ~Executor() tried to delete the SelectableTimer object again.

I am not sure how error config lead to jump out of while loop. It should be an exception occured.

Steps to reproduce the issue:

  1. Edit configure file, set multiple ip address on one loopback interface.
    "LOOPBACK_INTERFACE": {
    "Loopback2|101.101.101.101/32": {},
    "Loopback2|101.101.1.1/32": {},
    "Loopback2|101.101.1.2/32": {},
    "Loopback3|101.101.1.3/32": {},
    ...
    },
  2. Run system using the error config.
  3. Reboot system, a coredump file is produced.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/orchagent -d /var/log/swss -b 8192 -m 6c:ec:5a:08:18:67'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000071 in ?? ()
(gdb) bt
#0  0x0000000000000071 in ?? ()
#1  0x000000000048e811 in ~Executor (this=0x1a8f330, __in_chrg=<optimized out>) at orch.h:77
#2  ~ExecutableTimer (this=0x1a8f330, __in_chrg=<optimized out>) at timer.h:8
#3  swss::ExecutableTimer::~ExecutableTimer (this=0x1a8f330, __in_chrg=<optimized out>) at timer.h:8
#4  0x000000000041d8ed in _M_dispose (this=0x1a8c380) at /usr/include/c++/4.9/bits/shared_ptr_base.h:373
#5  _M_release (this=0x1a8c380) at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
#6  ~__shared_count (this=0x1a8dd80, __in_chrg=<optimized out>) at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
#7  ~__shared_ptr (this=0x1a8dd78, __in_chrg=<optimized out>) at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
#8  ~shared_ptr (this=0x1a8dd78, __in_chrg=<optimized out>) at /usr/include/c++/4.9/bits/shared_ptr.h:93
#9  ~pair (this=0x1a8dd70, __in_chrg=<optimized out>) at /usr/include/c++/4.9/bits/stl_pair.h:96
#10 destroy<std::pair<std::basic_string<char> const, std::shared_ptr<Executor> > > (this=<optimized out>, __p=0x1a8dd70) at /usr/include/c++/4.9/ext/new_allocator.h:124
#11 _S_destroy<std::pair<std::basic_string<char> const, std::shared_ptr<Executor> > > (__p=0x1a8dd70, __a=...) at /usr/include/c++/4.9/bits/alloc_traits.h:282
#12 destroy<std::pair<std::basic_string<char> const, std::shared_ptr<Executor> > > (__a=..., __p=0x1a8dd70) at /usr/include/c++/4.9/bits/alloc_traits.h:411
#13 _M_destroy_node (this=0x1a8b518, __p=0x1a8dd50) at /usr/include/c++/4.9/bits/stl_tree.h:436
#14 std::_Rb_tree<std::string, std::pair<std::string const, std::shared_ptr<Executor> >, std::_Select1st<std::pair<std::string const, std::shared_ptr<Executor> > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<Executor> > > >::_M_erase (this=this@entry=0x1a8b518, __x=0x1a8dd50)
   at /usr/include/c++/4.9/bits/stl_tree.h:1247
#15 0x000000000041d8a1 in std::_Rb_tree<std::string, std::pair<std::string const, std::shared_ptr<Executor> >, std::_Select1st<std::pair<std::string const, std::shared_ptr<Executor> > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<Executor> > > >::_M_erase (this=0x1a8b518, __x=0x1a8abd0)
   at /usr/include/c++/4.9/bits/stl_tree.h:1245
#16 0x00000000004b8dc7 in ~CrmOrch (this=0x1a8b510, __in_chrg=<optimized out>) at crmorch.h:37
#17 CrmOrch::~CrmOrch (this=0x1a8b510, __in_chrg=<optimized out>) at crmorch.h:37
#18 0x0000000000412e46 in OrchDaemon::~OrchDaemon (this=0x1a82900, __in_chrg=<optimized out>) at orchdaemon.cpp:43
#19 0x0000000000408c5e in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:288
(gdb) 
@prsunny
Copy link
Contributor

prsunny commented Oct 6, 2018

We are able to reproduce the issue and fix plan is in-progress.

@vemulabalaji
Copy link

Thought of updating the existing bug instead of raising new issue,
I observed orchagent segementation fault on config reload at some different place.

I am attaching my config.json file and the crash dump.
config_db_orchagent_crash.txt

config-reload-orchagent-crash.txt

Please let me know if more information is needed.

@qiluo-msft qiluo-msft self-assigned this Dec 17, 2018
@qiluo-msft qiluo-msft pinned this issue Dec 17, 2018
@qiluo-msft qiluo-msft unpinned this issue Dec 17, 2018
dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this issue Nov 4, 2021
Including the below commits to update swss submodule
8448a60 [vs tests]Migrating sonic-swss tests to use hwsku instead of fakeplatform (sonic-net#1978)
faa26db Fix random failure in PR/CI build. (sonic-net#2006)
e03edb6 Allow interface type value none (sonic-net#1991)
71b9650 [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (sonic-net#1967)
facdef5 [VS test] Skip flaky virtual chassis test (sonic-net#2004)
8261c1f [pytest]: Increase timeout when checking services (sonic-net#2000)
67278be [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (sonic-net#1934)
e92c1df Enable FEC statistics collection for Ethernet ports (sonic-net#1994)
9f30ca1 VxLAN Tunnel Counters and Rates implementation (sonic-net#1859)

Signed-off-by: Sudharsan Dhamal Gopalarathnam <[email protected]>
prsunny pushed a commit that referenced this issue Nov 5, 2021
Including the below commits to update swss submodule
8448a60 [vs tests]Migrating sonic-swss tests to use hwsku instead of fakeplatform (#1978)
faa26db Fix random failure in PR/CI build. (#2006)
e03edb6 Allow interface type value none (#1991)
71b9650 [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (#1967)
facdef5 [VS test] Skip flaky virtual chassis test (#2004)
8261c1f [pytest]: Increase timeout when checking services (#2000)
67278be [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (#1934)
e92c1df Enable FEC statistics collection for Ethernet ports (#1994)
9f30ca1 VxLAN Tunnel Counters and Rates implementation (#1859)

Signed-off-by: Sudharsan Dhamal Gopalarathnam <[email protected]>
judyjoseph added a commit that referenced this issue Nov 6, 2021
swss
73caba3 Allow interface type value none (#1991)

utilities
32e530f Allow interface type value none (#1902)
53f066c Fix log_ssd_health hang issue (#1904)
stepanblyschak added a commit to stepanblyschak/sonic-buildimage that referenced this issue Nov 11, 2021
```
5f8ebfa (HEAD, origin/master, origin/HEAD, master) [AclOrch] move ACL counters to flex counter infrastructure (sonic-net#1943)
8119ec0 [bfdorch] Orchagent support hardware BFD (sonic-net#1883)
15074ac [sonic-swss]:enable unconfiguring PFC on last TC on a port (sonic-net#1962)
05c7c05 [Mux orch] set default as standby, change mux orch priority (sonic-net#2010)
fe5b2a9 [pytest]: Ignore errors deleting host ifs (sonic-net#2005)
70da9af [ci]: use native arm64 and armhf pool (sonic-net#2013)
e14a071 [qos] Add EXP to TC map support (sonic-net#1954)
c91a7f2 [switchorch] Implement VXLAN src port range feature  (sonic-net#1959)
b20f0f4 Gcov for swss daemon (sonic-net#1737)
01c243a [CRM][MPLS] Fix the mpls nexthop CRM attribute (sonic-net#2008)
8448a60 [vs tests]Migrating sonic-swss tests to use hwsku instead of fakeplatform (sonic-net#1978)
faa26db Fix random failure in PR/CI build. (sonic-net#2006)
e03edb6 Allow interface type value none (sonic-net#1991)
71b9650 [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (sonic-net#1967)
facdef5 [VS test] Skip flaky virtual chassis test (sonic-net#2004)
8261c1f [pytest]: Increase timeout when checking services (sonic-net#2000)
67278be [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (sonic-net#1934)
e92c1df Enable FEC statistics collection for Ethernet ports (sonic-net#1994)
9f30ca1 VxLAN Tunnel Counters and Rates implementation (sonic-net#1859)
ac3103a Add missing neighbor resolution for MPLS route programming (sonic-net#1968)
bfba0ad [vlanmgr]Fix for STATE_DB port check logic (sonic-net#1980)
9ef2ba4 [vlanmgr]: Update VLAN removal code to work with 5.10 kernel and newer iproute2 versions (sonic-net#1970)
41fb26c [Mux orch] Handle setting unknown mux state (sonic-net#1984)
ac09bde [azp]: Increase timeout for VS tests (sonic-net#1988)
da8a43e [pytest]: Check if appl DB exists before deleting (sonic-net#1983)
553d75a [tunnel decap] Change tunnel orch order (sonic-net#1977)
7444e96 [macsecmgr]: Add rekey period in macsec mgr (sonic-net#1958)
d95823d [Buffermgr]Graceful handling of buffer model change (sonic-net#1956)
b0aa6a0 EVPN VxLAN enhancement to support P2MP tunnel based programming for Layer2 extension (sonic-net#1858)
85bdf54 Fix the option missing in kernel config issue (sonic-net#1973)
6b15584 Orchagent validates mirror session queue parameter against maximum value from SAI (sonic-net#1957)
fc9ffb9 [copp] Add ISIS, LDP and micro-BFD trap types to CoPP manager (sonic-net#1890)
452cbc1 [macsecorch]: Add IPG adjusting for MACsec gearbox model (sonic-net#1925)
```

Signed-off-by: Stepan Blyschak <[email protected]>
theasianpianist pushed a commit to theasianpianist/sonic-buildimage that referenced this issue Feb 5, 2022
*Allow user to set none value for interface type
liat-grozovik pushed a commit that referenced this issue May 12, 2022
288c2d8 Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" (#2161)
bce4694 [autoneg] add support for remote speed advertisement (#2124)
a73f156 [show][vrf]Fixing show vrf to include vlan subinterface (#2158)
7a06457 [auto_ts] Enable register/de-register auto_ts config for APP Extension (#2139)
083ebcc Add transceiver-info items advertised for cmis-supported moddules (#2135)
0811214 Validate destination port is not LAG (#2053)
6ab1c51 [minigraph]  Consume golden_config_db.json while loading minigraph (#2140)
c37a957 [Kdump] Remove the duplicate logic if Kdump was disabled (#2128)
1143869 Ordering fix for sfpshow eeprom (#2113)
fdb79b8 Allow fw update for other boot type against on the previous "none" boot fw update (#2040)
a54a091 [GCU] Supressing YANG errors from libyang while sorting (#1991)
fbfa8bc [GCU] Enabling AddRack and adding RemoveRack tests (#2143)
d012be9 [Command-Reference] Add CLI docs for route flow counter (#2069)
8c07d59 [Mellanox] [reboot] [asan] stop asan-enabled containers on reboot (#2107)
697aae3 Fix speed parsing when speed is NOT fetched from APPL_DB (#2138)
22a388b [show] fix get routing stack routine (#2137)
cb3a047 Support option --ports of config qos reload for reloading ports' QoS and buffer configuration to default (#2125)
154a801 Enhance "config interface type/advertised-type" to be blocked on RJ45 ports  (#2112)
3732ac5 Add CLI for route flow counter feature (#2031)
29771e7 [techsupport] improve robustness (#2117)
f9dc681 [intfutil] Display RJ45 port and portchannel speed in 'M' instead of 'G' when it's <= 1000M (#2110)
781ae9f [config] Do not enable pfcwd for BmcMgmtToRRouter (#2136)
23e9398 [scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)
576c9ef [scripts/fast-reboot] stop timers in advance (#2131)
4dad79c bugfix: incorrect command for portchannel creation (#2134)
c17b1f4 [show][muxcable] Decrease the timeout for show mux status/hwmode (#2130)
49d61f8 [scripts/fast-reboot] cleanup (#2132)
52ca324 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (#2066)
9e2fbf4 Update db_migrator to support `pfcwd_sw_enable` (#2087)
4010bd0 FGNHG CLI changes (#1588)
6bd54d0 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (#2126)
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this issue Jun 20, 2022
…anch

Related work items: #52, #71, #73, #75, #77, sonic-net#1306, sonic-net#1588, sonic-net#1991, sonic-net#2031, sonic-net#2040, sonic-net#2053, sonic-net#2066, sonic-net#2069, sonic-net#2087, sonic-net#2107, sonic-net#2110, sonic-net#2112, sonic-net#2113, sonic-net#2117, sonic-net#2124, sonic-net#2125, sonic-net#2126, sonic-net#2128, sonic-net#2130, sonic-net#2131, sonic-net#2132, sonic-net#2133, sonic-net#2134, sonic-net#2135, sonic-net#2136, sonic-net#2137, sonic-net#2138, sonic-net#2139, sonic-net#2140, sonic-net#2143, sonic-net#2158, sonic-net#2161, sonic-net#2233, sonic-net#2243, sonic-net#2250, sonic-net#2254, sonic-net#2260, sonic-net#2261, sonic-net#2267, sonic-net#2278, sonic-net#2282, sonic-net#2285, sonic-net#2288, sonic-net#2289, sonic-net#2292, sonic-net#2294, sonic-net#8887, sonic-net#9279, sonic-net#9390, sonic-net#9511, sonic-net#9700, sonic-net#10025, sonic-net#10322, sonic-net#10479, sonic-net#10484, sonic-net#10493, sonic-net#10500, sonic-net#10580, sonic-net#10595, sonic-net#10628, sonic-net#10634, sonic-net#10635, sonic-net#10644, sonic-net#10670, sonic-net#10691, sonic-net#10716, sonic-net#10731, sonic-net#10750, sonic-net#10751, sonic-net#10752, sonic-net#10761, sonic-net#10769, sonic-net#10775, sonic-net#10776, sonic-net#10779, sonic-net#10786, sonic-net#10792, sonic-net#10793, sonic-net#10800, sonic-net#10806, sonic-net#10826, sonic-net#10839, sonic-net#10840, sonic-net#10842, sonic-net#10844, sonic-net#10847, sonic-net#10849, sonic-net#10852, sonic-net#10865, sonic-net#10872, sonic-net#10877, sonic-net#10886, sonic-net#10889, sonic-net#10903, sonic-net#10904, sonic-net#10905, sonic-net#10913, sonic-net#10914, sonic-net#10916, sonic-net#10919, sonic-net#10925, sonic-net#10926, sonic-net#10929, sonic-net#10933, sonic-net#10934, sonic-net#10937, sonic-net#10941, sonic-net#10947, sonic-net#10952, sonic-net#10953, sonic-net#10957, sonic-net#10959, sonic-net#10971, sonic-net#10972, sonic-net#10980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants