Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP session not established with following error - nexthop_set failed, resetting connection - intf 0x556658d086f0 #14866

Closed
Hedgehog-Guru opened this issue Apr 27, 2023 · 2 comments
Labels
NVIDIA Triaged this issue has been triaged

Comments

@Hedgehog-Guru
Copy link

Description

Steps to reproduce the issue:

My setup is DUT + IXIA
1.Create 1k interface VLANs on DUT
2.Configure all needed BGP configurations on DUT and IXIA
3.Start BGP protocols on IXIA side and check that few peers are failed , usually between 1 to 5 peers are failed

Describe the results you received:

Not all BGP sessions are established and on IXIA side we see following error:
State Machine error received
On DUT side following error:
nexthop_set failed, resetting connection - intf 0x556658d086f0

Describe the results you expected:

Expected to see all sessions are established

Output of show version:

Last time i tested it on SONiC.202211_RC9.3 and SPC3 switch

Output of show techsupport:

[sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.001.zip](https://github.com/sonic-net/sonic-buildimage/files/11340827/sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.001.zip)
[sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.002.zip](https://github.com/sonic-net/sonic-buildimage/files/11340839/sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.002.zip)
[sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.003.zip](https://github.com/sonic-net/sonic-buildimage/files/11340840/sonic_dump_qa-eth-vt02-2-4700a1_20230411_102108.tar.zip.003.zip)

@gechiang
Copy link
Collaborator

@Hedgehog-Guru
This is also seen on official 202211 build.
What is the number of vlan interfaces is less case? do you still see this issue?
We had issues with over 1K vlan interface where OA is not in good state already...

This is a known FRR issue. this issue is purely to inform community such issue exists.
Can you share the FRR issue here? Link the commit.

@gechiang gechiang added Triaged this issue has been triaged NVIDIA labels May 10, 2023
@vivekrnv
Copy link
Contributor

Hi @gechiang ,

We haven't seen any issues with 500 interface vlans. This PR was supposed to fix this issue. FRRouting/frr#13396

lguohan pushed a commit that referenced this issue Aug 7, 2023
Why I did it
Upgrading FRR 8.5.1 to include latest fixes.

New patches that were added:

Patch	FRR Pull request	Issue fixed
0012-zebra-Rename-vrf_lookup_by_tableid-to-zebra_vrf_look.patch	FRRouting/frr#13396	#14866
0013-zebra-Move-protodown_r_bit-to-a-better-spot.patch	FRRouting/frr#13396	#14866
0014-zebra-Remove-unused-dplane_intf_delete.patch	FRRouting/frr#13396	#14866
0015-zebra-Remove-unused-add-variable.patch	FRRouting/frr#13396	#14866
0016-zebra-Remove-duplicate-function-for-netlink-interfac.patch	FRRouting/frr#13396	#14866
0017-zebra-Add-code-to-get-set-interface-to-pass-up-from-.patch	FRRouting/frr#13396	#14866
0018-zebra-Use-zebra-dplane-for-RTM-link-and-addr.patch	FRRouting/frr#13396	#14866
0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch	FRRouting/frr#13757	FRRouting/frr#13754
00020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch	FRRouting/frr#13757	FRRouting/frr#13754

Removed patches:

Patch	Upstream FRR commit that is present in 8.5.1
0001-Add-support-of-bgp-tcp-DSCP-value.patch	FRRouting/frr@425bd64
0010-zebra-Note-when-the-netlink-DUMP-command-is-interrup.patch	FRRouting/frr@2f71996
0011-bgpd-enhanced-capability-is-always-turned-on-for-int.patch	FRRouting/frr@8e89adc
0012-Ensure-ospf_apiclient_lsa_originate-cannot-accidently-write-into-stack.patch	FRRouting/frr@d2aeac3 , FRRouting/frr@49efc80, FRRouting/frr@ff6db10
0013-zebra-fix-dplane-fpm-nl-to-allow-for-fast-configuration.patch	FRRouting/frr@551fa8c
0014-bgpd-Allow-network-XXX-to-work-with-bgp-suppress-fib.patch	FRRouting/frr@4801fc4
0015-zebra-Return-statements-do-not-use-paranthesis.patch	FRRouting/frr@871a16c
0016-zebra-Add-zrouter.asic_notification_nexthop_control.patch	FRRouting/frr@06525c4
0017-zebra-Re-arrange-fpm_read-to-reduce-code-duplication.patch	FRRouting/frr@7d83e13
0018-zebra-Add-dplane_ctx_get-set_flags.patch	FRRouting/frr@10388e9
0019-zebra-Rearrange-dplane_ctx_route_init.patch	FRRouting/frr@f935122
0020-zebra-Add-ctx-to-netlink-message-parsing.patch	FRRouting/frr@45f0a10
0021-zebra-Read-from-the-dplane_fpm_nl-a-route-update.patch	FRRouting/frr@a0e1173
0022-zebra-Fix-code-because-missing-backport.patch	FRRouting/frr@07fd1f7
0024-zebra-continue-fpm-read-when-we-decide-a-netlink-message-is-not-needed.patch	FRRouting/frr@c0275ab
0025-zebra-Send-nht-resolved-entry-up-to-concerned-protoc.patch	FRRouting/frr@8ce0e51
0027-bgpd-Ensure-FRR-has-enough-data-to-read-in-peek_for_as4_capability-and-bgp_open_option_parse.patch	FRRouting/frr@3e46b43
0028-bgpd-Ensure-that-bgp-open-message-stream-has-enough-data-to-read.patch	FRRouting/frr@766eec1

Realigned patches:

Old Patch	New patch
0002-Reduce-severity-of-Vty-connected-from-message.patch	0001-Reduce-severity-of-Vty-connected-from-message.patch
0004-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch	0002-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch
0005-nexthops-compare-vrf-only-if-ip-type.patch	0003-nexthops-compare-vrf-only-if-ip-type.patch
0006-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch	0004-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch
0007-Add-support-of-bgp-l3vni-evpn.patch	0005-Add-support-of-bgp-l3vni-evpn.patch
0008-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch	0006-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0023-Use-vrf_id-for-vrf-not-tabled_id.patch	0008-Use-vrf_id-for-vrf-not-tabled_id.patch
0026-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch	0009-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch
0029-bgpd-Change-log-level-for-graceful-restart-events.patch	0010-bgpd-Change-log-level-for-graceful-restart-events.patch
0030-zebra-Static-routes-async-notification-do-not-need-t.patch	0011-zebra-Static-routes-async-notification-do-not-need-t.patch

How I did it
Upgrade FRR submodule. Align the patches. Integrate new patches to fix issues.

How to verify it
Run sonic-mgmt regression to verify
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Aug 19, 2023
Why I did it
Upgrading FRR 8.5.1 to include latest fixes.

New patches that were added:

Patch	FRR Pull request	Issue fixed
0012-zebra-Rename-vrf_lookup_by_tableid-to-zebra_vrf_look.patch	FRRouting/frr#13396	sonic-net#14866
0013-zebra-Move-protodown_r_bit-to-a-better-spot.patch	FRRouting/frr#13396	sonic-net#14866
0014-zebra-Remove-unused-dplane_intf_delete.patch	FRRouting/frr#13396	sonic-net#14866
0015-zebra-Remove-unused-add-variable.patch	FRRouting/frr#13396	sonic-net#14866
0016-zebra-Remove-duplicate-function-for-netlink-interfac.patch	FRRouting/frr#13396	sonic-net#14866
0017-zebra-Add-code-to-get-set-interface-to-pass-up-from-.patch	FRRouting/frr#13396	sonic-net#14866
0018-zebra-Use-zebra-dplane-for-RTM-link-and-addr.patch	FRRouting/frr#13396	sonic-net#14866
0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch	FRRouting/frr#13757	FRRouting/frr#13754
00020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch	FRRouting/frr#13757	FRRouting/frr#13754

Removed patches:

Patch	Upstream FRR commit that is present in 8.5.1
0001-Add-support-of-bgp-tcp-DSCP-value.patch	FRRouting/frr@425bd64
0010-zebra-Note-when-the-netlink-DUMP-command-is-interrup.patch	FRRouting/frr@2f71996
0011-bgpd-enhanced-capability-is-always-turned-on-for-int.patch	FRRouting/frr@8e89adc
0012-Ensure-ospf_apiclient_lsa_originate-cannot-accidently-write-into-stack.patch	FRRouting/frr@d2aeac3 , FRRouting/frr@49efc80, FRRouting/frr@ff6db10
0013-zebra-fix-dplane-fpm-nl-to-allow-for-fast-configuration.patch	FRRouting/frr@551fa8c
0014-bgpd-Allow-network-XXX-to-work-with-bgp-suppress-fib.patch	FRRouting/frr@4801fc4
0015-zebra-Return-statements-do-not-use-paranthesis.patch	FRRouting/frr@871a16c
0016-zebra-Add-zrouter.asic_notification_nexthop_control.patch	FRRouting/frr@06525c4
0017-zebra-Re-arrange-fpm_read-to-reduce-code-duplication.patch	FRRouting/frr@7d83e13
0018-zebra-Add-dplane_ctx_get-set_flags.patch	FRRouting/frr@10388e9
0019-zebra-Rearrange-dplane_ctx_route_init.patch	FRRouting/frr@f935122
0020-zebra-Add-ctx-to-netlink-message-parsing.patch	FRRouting/frr@45f0a10
0021-zebra-Read-from-the-dplane_fpm_nl-a-route-update.patch	FRRouting/frr@a0e1173
0022-zebra-Fix-code-because-missing-backport.patch	FRRouting/frr@07fd1f7
0024-zebra-continue-fpm-read-when-we-decide-a-netlink-message-is-not-needed.patch	FRRouting/frr@c0275ab
0025-zebra-Send-nht-resolved-entry-up-to-concerned-protoc.patch	FRRouting/frr@8ce0e51
0027-bgpd-Ensure-FRR-has-enough-data-to-read-in-peek_for_as4_capability-and-bgp_open_option_parse.patch	FRRouting/frr@3e46b43
0028-bgpd-Ensure-that-bgp-open-message-stream-has-enough-data-to-read.patch	FRRouting/frr@766eec1

Realigned patches:

Old Patch	New patch
0002-Reduce-severity-of-Vty-connected-from-message.patch	0001-Reduce-severity-of-Vty-connected-from-message.patch
0004-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch	0002-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch
0005-nexthops-compare-vrf-only-if-ip-type.patch	0003-nexthops-compare-vrf-only-if-ip-type.patch
0006-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch	0004-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch
0007-Add-support-of-bgp-l3vni-evpn.patch	0005-Add-support-of-bgp-l3vni-evpn.patch
0008-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch	0006-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0023-Use-vrf_id-for-vrf-not-tabled_id.patch	0008-Use-vrf_id-for-vrf-not-tabled_id.patch
0026-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch	0009-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch
0029-bgpd-Change-log-level-for-graceful-restart-events.patch	0010-bgpd-Change-log-level-for-graceful-restart-events.patch
0030-zebra-Static-routes-async-notification-do-not-need-t.patch	0011-zebra-Static-routes-async-notification-do-not-need-t.patch

How I did it
Upgrade FRR submodule. Align the patches. Integrate new patches to fix issues.

How to verify it
Run sonic-mgmt regression to verify
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this issue Sep 20, 2023
Why I did it
Upgrading FRR 8.5.1 to include latest fixes.

New patches that were added:

Patch	FRR Pull request	Issue fixed
0012-zebra-Rename-vrf_lookup_by_tableid-to-zebra_vrf_look.patch	FRRouting/frr#13396	sonic-net#14866
0013-zebra-Move-protodown_r_bit-to-a-better-spot.patch	FRRouting/frr#13396	sonic-net#14866
0014-zebra-Remove-unused-dplane_intf_delete.patch	FRRouting/frr#13396	sonic-net#14866
0015-zebra-Remove-unused-add-variable.patch	FRRouting/frr#13396	sonic-net#14866
0016-zebra-Remove-duplicate-function-for-netlink-interfac.patch	FRRouting/frr#13396	sonic-net#14866
0017-zebra-Add-code-to-get-set-interface-to-pass-up-from-.patch	FRRouting/frr#13396	sonic-net#14866
0018-zebra-Use-zebra-dplane-for-RTM-link-and-addr.patch	FRRouting/frr#13396	sonic-net#14866
0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch	FRRouting/frr#13757	FRRouting/frr#13754
00020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch	FRRouting/frr#13757	FRRouting/frr#13754

Removed patches:

Patch	Upstream FRR commit that is present in 8.5.1
0001-Add-support-of-bgp-tcp-DSCP-value.patch	FRRouting/frr@425bd64
0010-zebra-Note-when-the-netlink-DUMP-command-is-interrup.patch	FRRouting/frr@2f71996
0011-bgpd-enhanced-capability-is-always-turned-on-for-int.patch	FRRouting/frr@8e89adc
0012-Ensure-ospf_apiclient_lsa_originate-cannot-accidently-write-into-stack.patch	FRRouting/frr@d2aeac3 , FRRouting/frr@49efc80, FRRouting/frr@ff6db10
0013-zebra-fix-dplane-fpm-nl-to-allow-for-fast-configuration.patch	FRRouting/frr@551fa8c
0014-bgpd-Allow-network-XXX-to-work-with-bgp-suppress-fib.patch	FRRouting/frr@4801fc4
0015-zebra-Return-statements-do-not-use-paranthesis.patch	FRRouting/frr@871a16c
0016-zebra-Add-zrouter.asic_notification_nexthop_control.patch	FRRouting/frr@06525c4
0017-zebra-Re-arrange-fpm_read-to-reduce-code-duplication.patch	FRRouting/frr@7d83e13
0018-zebra-Add-dplane_ctx_get-set_flags.patch	FRRouting/frr@10388e9
0019-zebra-Rearrange-dplane_ctx_route_init.patch	FRRouting/frr@f935122
0020-zebra-Add-ctx-to-netlink-message-parsing.patch	FRRouting/frr@45f0a10
0021-zebra-Read-from-the-dplane_fpm_nl-a-route-update.patch	FRRouting/frr@a0e1173
0022-zebra-Fix-code-because-missing-backport.patch	FRRouting/frr@07fd1f7
0024-zebra-continue-fpm-read-when-we-decide-a-netlink-message-is-not-needed.patch	FRRouting/frr@c0275ab
0025-zebra-Send-nht-resolved-entry-up-to-concerned-protoc.patch	FRRouting/frr@8ce0e51
0027-bgpd-Ensure-FRR-has-enough-data-to-read-in-peek_for_as4_capability-and-bgp_open_option_parse.patch	FRRouting/frr@3e46b43
0028-bgpd-Ensure-that-bgp-open-message-stream-has-enough-data-to-read.patch	FRRouting/frr@766eec1

Realigned patches:

Old Patch	New patch
0002-Reduce-severity-of-Vty-connected-from-message.patch	0001-Reduce-severity-of-Vty-connected-from-message.patch
0004-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch	0002-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch
0005-nexthops-compare-vrf-only-if-ip-type.patch	0003-nexthops-compare-vrf-only-if-ip-type.patch
0006-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch	0004-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch
0007-Add-support-of-bgp-l3vni-evpn.patch	0005-Add-support-of-bgp-l3vni-evpn.patch
0008-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch	0006-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0009-ignore-route-from-default-table.patch	0007-ignore-route-from-default-table.patch
0023-Use-vrf_id-for-vrf-not-tabled_id.patch	0008-Use-vrf_id-for-vrf-not-tabled_id.patch
0026-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch	0009-bgpd-Ensure-suppress-fib-pending-works-with-network-.patch
0029-bgpd-Change-log-level-for-graceful-restart-events.patch	0010-bgpd-Change-log-level-for-graceful-restart-events.patch
0030-zebra-Static-routes-async-notification-do-not-need-t.patch	0011-zebra-Static-routes-async-notification-do-not-need-t.patch

How I did it
Upgrade FRR submodule. Align the patches. Integrate new patches to fix issues.

How to verify it
Run sonic-mgmt regression to verify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NVIDIA Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants