-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
link down issue on Arista SONIC T2 in staging when switching from 202205 to 202405 #120
Comments
|
Regarding L1 CRC errors seen on downstream T1 devices, this was a complete coincidence with bringing up fabric module 0. Any uncorrectable issue with fabric links would lead to drops to packets and such packets would not make to front panel ports to generate CRC errors. The CRC errors are caused by something else. Regarding the syslogs posted in the description, they are benign as they are generated when fabric chips are initializing and coming online. |
There is a possibility the packet corruption (CRC errors) is due to FIFO underflow. If the issue occurs again, the thresholds can be tuned to confirm whether the underflow is the trigger. Broadcom has documented how to tune this threshold and we can help determine a better setting (https://brcmsemiconductor-csm.wolkenservicedesk.com/wolken-support/article?articleId=19817). Closing this issue. Please continue to monitor and if it occurs again, let's get on a call. |
thanks, yes, I think thers is PR to fix the switchid @arlakshm |
so far we noticed the link down alert happens when we upgrade the Arista T2 from SONIC 202205 image to 202405 image.
the alert arise because of 2 issues:
<30>2025-01-30T21:14:59.666022+00:00 STG01-0101-0400-01T2-sup00 INFO systemd[1]: Started [email protected] - switch state service.
<13>2025-01-30T21:15:01.666281+00:00 STG01-0101-0400-01T2-sup00 NOTICE root: Started swss0 service...
<14>2025-01-30T21:15:22.349038+00:00 STG01-0101-0400-01T2-lc05 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<10>2025-01-30T21:15:22.349099+00:00 STG01-0101-0400-01T2-lc05 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:950 0x5a1100 Received unhandled switch event - Device Interrupt(13) on unit 0: 0x29a 0x0 0x0
<14>2025-01-30T21:15:22.749727+00:00 STG01-0101-0400-01T2-lc05 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<14>2025-01-30T21:15:22.840429+00:00 STG01-0101-0400-01T2-lc05 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<14>2025-01-30T21:15:23.122929+00:00 STG01-0101-0400-01T2-lc04 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<10>2025-01-30T21:15:23.123224+00:00 STG01-0101-0400-01T2-lc04 CRIT syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_switch_event_cb:950 0x5a1100 Received unhandled switch event - Device Interrupt(13) on unit 0: 0x29a 0x0 0x0
<14>2025-01-30T21:15:23.173596+00:00 STG01-0101-0400-01T2-lc04 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<14>2025-01-30T21:15:23.305868+00:00 STG01-0101-0400-01T2-lc04 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<14>2025-01-30T21:15:23.418543+00:00 STG01-0101-0400-01T2-lc04 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<14>2025-01-30T21:15:23.565039+00:00 STG01-0101-0400-01T2-lc04 INFO syncd#supervisord: syncd 0:dnxc_interrupt_print_info: name=RTP_LinkMaskChange, id=666, index=0, block=0, unit=0, recurring_action=0 | Check RMGR / RTPWP settings in both device and device partner. If configuration is OK - look for physical link error indication and retrain link if needed. | RTP Link Mask Change#015
<10>2025-01-30T21:15:23.626048+00:00 STG01-0101-0400-01T2-lc03 CRIT syncd0#syncd: [06:00.0] SAI_API_SWITCH:_brcm_sai_switch_event_cb:950 0x5a1100 Received unhandled switch event - Device Interrupt(13) on unit 0: 0x8b8 0x0 0x0
<10>2025-01-30T21:15:24.126213+00:00 STG01-0101-0400-01T2-lc03 CRIT syncd1#syncd: [07:00.0] SAI_API_SWITCH:_brcm_sai_switch_event_cb:950 0x5a1100 Received unhandled switch event - Device Interrupt(13) on unit 0: 0x8b8 0x0 0x0
The text was updated successfully, but these errors were encountered: