Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 #2189

Closed
pollyhsu2git opened this issue Oct 24, 2018 · 5 comments

Comments

@pollyhsu2git
Copy link
Contributor

pollyhsu2git commented Oct 24, 2018

Description
CLI / redis DB hangs upon ARP cache's hitting the default maximum Limit, 1024 on the latest jenkins built image, n750 (Tue Oct 23 13:30:19 UTC 2018)

Steps to reproduce the issue:

  1. show platform summary
  2. show ver
  3. Retrieve the ARP table default maximum setting (1024)
  4. Bring up the Interfaces
  5. Insert the static ARP entries up to the default maximum limit
  6. Make sure that the static ARP entries are up to the limit
  7. Try any show CLI commands
  8. Try to get from any redis DB

Describe the results you received:
root@sonic:/home/admin# show platform summary
Platform: x86_64-accton_as5712_54x-r0
HwSKU: Accton-AS5712-54X
ASIC: broadcom
root@sonic:/home/admin# show ver
SONiC Software Version: SONiC.HEAD.750-dirty-20181023.092041
Distribution: Debian 9.5
Kernel: 4.9.0-7-amd64
Build commit: 709cd5a
Build date: Tue Oct 23 13:30:19 UTC 2018
Built by: johnar@jenkins-worker-4
root@sonic:/home/admin# sysctl net.ipv4.neigh.default.gc_thresh1
net.ipv4.neigh.default.gc_thresh1 = 128
root@sonic:/home/admin# sysctl net.ipv4.neigh.default.gc_thresh2
net.ipv4.neigh.default.gc_thresh2 = 512
root@sonic:/home/admin# sysctl net.ipv4.neigh.default.gc_thresh3
net.ipv4.neigh.default.gc_thresh3 = 1024
root@sonic:/home/admin# show interfaces status
Ethernet48 97 N/A 9100 tenGigE48 up up
root@sonic:/home/admin# arp -v -i Ethernet48 -s 192.168.1.1 00:2E:00:FF:0:0
[SKIPed]
ADD_COUNT=1024
arp -v -i Ethernet48 -s 192.168.5.5 00:2E:00:FF:4:0
arp: SIOCSARP()
SIOCSARP: No buffer space available
[SKIPed]
root@sonic:/home/admin# arp -a | grep "Ethernet48" -c
1024
root@sonic:/home/admin# show mac
^C
Traceback (most recent call last):
File "/usr/bin/show", line 9, in
load_entry_point('sonic-utilities==1.2', 'console_scripts', 'show')()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 561, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2631, in load_entry_point
return ep.load()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2291, in load
return self.resolve()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2297, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/lib/python2.7/dist-packages/show/main.py", line 193, in
iface_alias_converter = InterfaceAliasConverter()
File "/usr/lib/python2.7/dist-packages/show/main.py", line 54, in init
self.port_dict = json.loads(p.stdout.read())
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 263, in
main()
File "/usr/local/bin/sonic-cfggen", line 217, in main
configdb.connect()
File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 57, in connect
SonicV2Connector.connect(self, self.CONFIG_DB, retry_on)
File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 191, in connect
self._onetime_connect(db_name)
File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 204, in _onetime_connect
client.config_set('notify-keyspace-events', self.KEYSPACE_EVENTS)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 719, in config_set
return self.execute_command('CONFIG SET', name, value)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 667, in execute_command
connection.send_command(*args)
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 585, in send_packed_command
self.connect()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 484, in connect
sock = self._connect()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 529, in _connect
sock.connect(socket_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
KeyboardInterrupt
root@sonic:/home/admin# generate_dump
^C^C^C^C^C^C^C
root@sonic:/home/admin# echo "no response for 10 minutes
no response for 10 minutes
root@sonic:/home/admin# redis-cli -n 0
^C^C
root@sonic:/home/admin# redis-cli -n 1
^C
root@sonic:/home/admin# redis-cli -n 3
^C
root@sonic:/home/admin# redis-cli -n 4
^C
root@sonic:/home/admin# redis-cli -n 5
^C
root@sonic:/home/admin# redis-cli -n 6
^C
root@sonic:/home/admin# echo " no response for 30 seconds of waiting"
no response for 30 seconds of waiting

Describe the results you expected:
Although the ARP cache maximum limit is hit, the CLI and redis DB should NOT be hung.

Additional information you deem important (e.g. issue happens only occasionally):
root@sonic:/home/admin# show ver
SONiC Software Version: SONiC.HEAD.750-dirty-20181023.092041
Distribution: Debian 9.5
Kernel: 4.9.0-7-amd64
Build commit: 709cd5a
Build date: Tue Oct 23 13:30:19 UTC 2018
Built by: johnar@jenkins-worker-4

**Attach debug file `sudo generate_dump`:**

no generat_dump debug file for us to provide as the system was hung, but we upload our debug console log ZIP file here
100_SONic-V2-AS5712-j_n750_20181023_709cd5a-ARP_1024-Debug-002.zip

@pollyhsu2git pollyhsu2git changed the title CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 on Jenkins#750 Oct 24, 2018
@pollyhsu2git pollyhsu2git changed the title CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 on Jenkins#750 CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 Oct 24, 2018
@pollyhsu2git
Copy link
Contributor Author

pollyhsu2git commented Oct 24, 2018

When the issue hit, the CPU was tied up with the processes, syncd, python3.6 and redis-serv, also the messages, neighbor table overflow! reporting from Kernel~

100_sonic-v2-as5712-j_n750_20181023_709cd5a-arp_1024-debug-001

Nov 3 18:20:21.521369 sonic WARNING kernel: [ 3828.702948] net_ratelimit: 2 callbacks suppressed
Nov 3 18:20:21.521406 sonic INFO kernel: [ 3828.702949] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:21.521411 sonic INFO kernel: [ 3828.702998] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:21.521415 sonic INFO kernel: [ 3828.703037] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:21.645976 sonic INFO kernel: [ 3828.830923] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:23.569389 sonic INFO kernel: [ 3830.750913] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:23.569421 sonic INFO kernel: [ 3830.750960] neighbour: arp_cache: neighbor table overflow!
Nov 3 18:20:23.665336 sonic INFO kernel: [ 3830.846920] neighbour: arp_cache: neighbor table overflow!

@zhenggen-xu
Copy link
Collaborator

This is a kernel setting issue that invoking the gc process always if the neighbor entries are more than gc_thresh3. If the system requires to get more than 1024 neighbor entries, you should set the kernel parameters to a proper value, e,g, 4096: sysctl -w net.ipv4.neigh.default.gc_thresh3=4096 . You may want to tune the gc_thresh1 and gc_thresh2 accordingly to have optimized setting for your case.

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

@pollyhsu2git
Copy link
Contributor Author

@zhenggen-xu
Thanks for the info~ We are aware of this optimization.
We just want to make sure .... Is it also one of the matters on SONiC, which the administrator MUST watch for and tune it manually?

@zhenggen-xu
Copy link
Collaborator

ARP table is managed by kernel directly, SONiC is the downstream consumer and has no obvious constrain(except memory etc). To support your ARP table size requirement, you would need tune the kernel settings.

@pollyhsu2git
Copy link
Contributor Author

@zhenggen-xu
Clarified!
Thanks~

@stcheng stcheng closed this as completed Nov 7, 2018
yxieca pushed a commit that referenced this issue Jun 9, 2022
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189)

Signed-off-by: vaibhav-dahiya [email protected]
wen587 pushed a commit that referenced this issue Jun 22, 2022
05c79ef Fix header for the output table following 'show ipv6 interface' command (#2219)
fc5633f increase coverage to 80% (#2214)
c0dffba [config][muxcable] fix minor config DB logic issue (#2210)
a50eca0 [generic-config-updater] Add NTP validator  (#2212)
a3d1345 [gendump] Add Support to dump BCM-DNX commands (#1813)
bb185d5 [yang] remove mistakenly added parameter for 'get_module_name' (#2193)
2cccf26 [counters] skip showing counters that are not enabled (#2199)
ff05bc8 [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189)
3197f39 Add check to not allow deleting PO if its member of vlan. (#2141)
2513da1 [dump] Optimized dump state cli and modified tests to not use common data  (#2175)
9e310e5 Fix sonic-installer and 'show version' command crash when database docker not running issue. (#2183)
4ad70b9 [sonic-installer] use host docker startup arguments when running dockerd in chroot  (#2179)
3d3c89b fix for non-coherent cmis modules (#2163)
2054680 [subinterface] Fix route add command to accept subinterface as dev (#2180)
5383e92 [subinterface]Avoid removing the subinterface when last configured ip is removed (#2181)
f5af780 [GCU] Handling type1 lists (#2171)
4516179 [yang] extend ConfigMgmt constructor to pass YANG options (#2118)
2f53bd4 [dump] implement ACL modules (#2153)
494dd62 show commands for SYSTEM READY (#1851)
4fc09b1 [GCU] Handling non-compliant leaf-list with string values (#2174)
675c7b6 Add sonic-delayed.target to Application Extension .timer file generator (#2176)
c587933 [portconfig] Allow to configure interface mtu for physical ports only
9881f3e Broadcast Unknown-multicast and Unknown-unicast Storm-control  (#928)
88286cb sonic-utils: initial support for link-training (#2071)
robertvolkmann pushed a commit to robertvolkmann/sonic-buildimage that referenced this issue Jul 26, 2022
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (sonic-net#2189)

Signed-off-by: vaibhav-dahiya [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants