Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldp] lldpmgrd crashed in test_iface_namingmode #5697

Closed
bingwang-ms opened this issue Oct 22, 2020 · 5 comments · Fixed by sonic-net/sonic-swss-common#408
Closed

[lldp] lldpmgrd crashed in test_iface_namingmode #5697

bingwang-ms opened this issue Oct 22, 2020 · 5 comments · Fixed by sonic-net/sonic-swss-common#408
Assignees
Labels
Master Branch Quality P0 Priority of the issue

Comments

@bingwang-ms
Copy link
Contributor

Description
The lldp service crashed in test_iface_namingmode.
The alias of interfaces will be updated in test setup and teardown with following commands, which have a quite lagre possibility to crash lldp.

redis-cli -n 4 HSET "PORT|Ethernet8" alias TestAlias0
redis-cli -n 4 HSET "PORT|Ethernet0" alias TestAlias1
redis-cli -n 4 HSET "PORT|Ethernet4" alias TestAlias2
redis-cli -n 4 HSET "PORT|Ethernet108" alias TestAlias3
redis-cli -n 4 HSET "PORT|Ethernet100" alias TestAlias4
redis-cli -n 4 HSET "PORT|Ethernet104" alias TestAlias5
redis-cli -n 4 HSET "PORT|Ethernet96" alias TestAlias6
redis-cli -n 4 HSET "PORT|Ethernet124" alias TestAlias7
redis-cli -n 4 HSET "PORT|Ethernet92" alias TestAlias8
redis-cli -n 4 HSET "PORT|Ethernet120" alias TestAlias9
redis-cli -n 4 HSET "PORT|Ethernet52" alias TestAlias10
redis-cli -n 4 HSET "PORT|Ethernet56" alias TestAlias11

The crash logs are following.

Oct 22 06:32:03.160159 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd terminate called after throwing an instance of 'std::system_error'
Oct 22 06:32:03.160159 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd   what():  Unable to connect to redis: Cannot assign requested address
Oct 22 06:32:03.265170 str-dx010-acs-4 DEBUG core_uploader.py: core_uploader.py: Received create event - /var/core/python.1603348323.38.core.gz

The log said that redis is not accessible, but I did a test and redis seemd fine.

admin@str-dx010-acs-4:~$ redis-cli -n 4 keys '*'
  1) "BUFFER_PG|Ethernet16|0"
  2) "QUEUE|Ethernet32|4"
  3) "BUFFER_QUEUE|Ethernet40|0-2"
  4) "DHCP_SERVER|192.0.0.4"
  5) "BUFFER_PG|Ethernet80|0"
  6) "BUFFER_PG|Ethernet88|0"
......

Steps to reproduce the issue:

  1. Just run test case test_iface_namingmode

Describe the results you received:
The lldp service crashed, and test failed.

Describe the results you expected:
No service crashed.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

SONiC Software Version: SONiC.master.447-bef76b10
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: bef76b10
Build date: Sat Oct 17 09:50:26 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
Serial Number: DX010F2B118711MS100007
Uptime: 06:37:27 up 15:25,  2 users,  load average: 3.92, 3.84, 3.50

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-teamd                  latest                5c592b65e21d        396MB
docker-teamd                  master.447-bef76b10   5c592b65e21d        396MB
docker-sonic-mgmt-framework   latest                f85eacaccd94        486MB
docker-sonic-mgmt-framework   master.447-bef76b10   f85eacaccd94        486MB
docker-router-advertiser      latest                a153afc17879        359MB
docker-router-advertiser      master.447-bef76b10   a153afc17879        359MB
docker-platform-monitor       latest                408344f9a0bc        443MB
docker-platform-monitor       master.447-bef76b10   408344f9a0bc        443MB
docker-lldp                   latest                8ecf97c3f689        391MB
docker-lldp                   master.447-bef76b10   8ecf97c3f689        391MB
docker-dhcp-relay             latest                7f0488fd7b0a        366MB
docker-dhcp-relay             master.447-bef76b10   7f0488fd7b0a        366MB
docker-database               latest                45e3750c14db        359MB
docker-database               master.447-bef76b10   45e3750c14db        359MB
docker-orchagent              latest                c061654f8a96        410MB
docker-orchagent              master.447-bef76b10   c061654f8a96        410MB
docker-nat                    latest                24899834e99d        399MB
docker-nat                    master.447-bef76b10   24899834e99d        399MB
docker-sonic-telemetry        latest                29a2e0e1c787        429MB
docker-sonic-telemetry        master.447-bef76b10   29a2e0e1c787        429MB
docker-fpm-frr                latest                e20e2b0974c8        413MB
docker-fpm-frr                master.447-bef76b10   e20e2b0974c8        413MB
docker-sflow                  latest                1a71bf3e2eed        400MB
docker-sflow                  master.447-bef76b10   1a71bf3e2eed        400MB
docker-snmp                   latest                a00953c0ef4e        399MB
docker-snmp                   master.447-bef76b10   a00953c0ef4e        399MB
docker-syncd-brcm             latest                d5e9f792d448        542MB
docker-syncd-brcm             master.447-bef76b10   d5e9f792d448        542MB


**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```

syslog.zip

@bingwang-ms bingwang-ms changed the title [lldp] lldp crash in test_iface_namingmode [lldp] lldp crashed in test_iface_namingmode Oct 22, 2020
@daall daall added the P1 Priority of the issue, lower than P0 label Oct 27, 2020
@daall
Copy link
Contributor

daall commented Oct 28, 2020

Based on the error message this might be related to #3849

@lguohan
Copy link
Collaborator

lguohan commented Oct 29, 2020

i tried this on the master kvm image multiple times, I cannot repro it. how frequently does this issue appear?

@bingwang-ms
Copy link
Contributor Author

bingwang-ms commented Oct 29, 2020

It happens pretty frequently. I just ran test_iface_namingmode.py::TestShowInterfaces::test_show_interfaces_portchannel on dx010-4, and it happened almost each time. But no coredump is generated.

Oct 29 09:13:39.103513 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd terminate called after throwing an instance of 'std::system_error'
Oct 29 09:13:39.105535 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd   what():
Oct 29 09:13:39.105535 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd Unable to connect to redis: Cannot assign requested address
Oct 29 09:13:39.106222 str-dx010-acs-4 INFO lldp#supervisord: lldpmgrd

@lguohan
Copy link
Collaborator

lguohan commented Oct 30, 2020

@lguohan
Copy link
Collaborator

lguohan commented Nov 2, 2020

lldpmgrd-debug.tar.gz

it is easy to repro it now.

just using ./testalias.sh and ./testalias1.sh alternatively, usually in 2~3 rounds you will be able to repro the lldpmgrd crash. I also added some debug log in lldpmgrd. please find the scripts in the attached file.

it looks like there is something wrong in the swss-common python library. @qiluo-msft , can you take a look?

below is my debug output.

Nov  2 07:50:34.159318 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.159483 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.159483 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.159483 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.159931 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.165226 vlab-01 INFO dockerd[547]: time="2020-11-02T07:50:34.165152789Z" level=error msg="Handler for POST /v1.39/exec/bcfb1d676fd66531e12b8640cc4329723c8a9c38fbdb673652e270ac25c10ae6/resize returned e
rror: cannot resize a stopped container: unknown"
Nov  2 07:50:34.166073 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.166116 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.166116 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.166116 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.166246 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.175670 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.175670 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.175670 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.175670 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.175670 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.184067 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.184067 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.184101 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.184101 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.184133 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.190791 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.190815 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.190815 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.190839 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.190996 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.198902 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.199236 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.199236 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.199236 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.199480 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.210470 vlab-01 INFO lldp#lldpmgrd[3947]: select wait...
Nov  2 07:50:34.210470 vlab-01 INFO lldp#lldpmgrd[3947]: popping config db...
Nov  2 07:50:34.210505 vlab-01 INFO lldp#lldpmgrd[3947]: popped config db ...
Nov  2 07:50:34.210505 vlab-01 INFO lldp#lldpmgrd[3947]: popping appdb db...
Nov  2 07:50:34.210527 vlab-01 INFO lldp#lldpmgrd[3947]: popped appdb db Ethernet76 SET (('index', '19'), ('lanes', '77,78,79,80'), ('description', 'Servers18:eth0'), ('admin_status', 'up'), ('mtu', '9100'), ('alias'
, 'TA19_'), ('pfc_asym', 'off'), ('speed', '40000'), ('oper_status', 'up'))...
Nov  2 07:50:34.210880 vlab-01 INFO lldp#supervisord: lldpmgrd terminate called after throwing an instance of '
Nov  2 07:50:34.210950 vlab-01 INFO lldp#supervisord: lldpmgrd std::system_error
Nov  2 07:50:34.211080 vlab-01 INFO lldp#supervisord: lldpmgrd '
Nov  2 07:50:34.211189 vlab-01 INFO lldp#supervisord: lldpmgrd   what():
Nov  2 07:50:34.211251 vlab-01 INFO lldp#supervisord: lldpmgrd Unable to connect to redis: Cannot assign requested address
Nov  2 07:50:34.211406 vlab-01 INFO lldp#supervisord: lldpmgrd

@lguohan lguohan removed their assignment Nov 2, 2020
@lguohan lguohan changed the title [lldp] lldp crashed in test_iface_namingmode [lldp] lldpmgrd crashed in test_iface_namingmode Nov 2, 2020
@lguohan lguohan added P0 Priority of the issue and removed P1 Priority of the issue, lower than P0 labels Nov 2, 2020
qiluo-msft added a commit to sonic-net/sonic-swss-common that referenced this issue Nov 4, 2020
kktheballer pushed a commit to kktheballer/sonic-swss-common that referenced this issue Dec 21, 2020
kktheballer pushed a commit to kktheballer/sonic-swss-common that referenced this issue Jan 21, 2021
kktheballer pushed a commit to kktheballer/sonic-swss-common that referenced this issue Feb 25, 2021
kktheballer pushed a commit to kktheballer/sonic-swss-common that referenced this issue Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Master Branch Quality P0 Priority of the issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants