Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEIGH_TABLE not populated with VXLAN routes #3384

Open
bradh352 opened this issue Nov 20, 2024 · 0 comments · May be fixed by #3478
Open

NEIGH_TABLE not populated with VXLAN routes #3384

bradh352 opened this issue Nov 20, 2024 · 0 comments · May be fixed by #3478

Comments

@bradh352
Copy link
Contributor

bradh352 commented Nov 20, 2024

Observed on master and 202405 (with PR #3383 applied to make VXLANs actually work).

Basic architecture is VXLAN EVPN with an irb/svi vni interface on the switches participating in the vxlan fabric.

|------------| 10.0.0.50             |---------------------------------|
|    host    |-----------------------|             sonic1              |
|------------|             Ethernet8 | Loopback0: 172.16.0.1           |
                           Untagged  | VLAN 2/VNI 10002 irb: 10.0.0.71 |
                           VLAN2     |---------------------------------|
                           VNI 10002                 | Ethernet54
                                                     | BGP Unnumbered
                                                     |
                                                     |
                                          Ethernet54 |
                                      BGP Unnumbered |
                                     |---------------------------------|
                                     |              sonic2             |
                                     | Loopback0: 172.16.0.2           |
                                     | VLAN 2/VNI 10002 irb: 10.0.0.72 |
                                     |---------------------------------|

In sw2 I've noticed log entries like:

2024 Nov 20 21:42:55.482102 sw2 WARNING swss#arp_update[900]: 108 MAC mismatch for 10.0.0.50 on Vlan2 - kernel: 18:5a:58:2a:e8:20, APPL_DB:

Then when I investigate, NEIGH_TABLE in APPL_DB doesn't have any neighbors listed for Vlan2.

# sonic-db-dump -n APPL_DB -y -k "NEIGH_TABLE:Vlan2:*"
{}

But the kernel has the neighbor listed as added by BGP/Zebra:

root@sw2:~# ip neigh show dev Vlan2
10.0.0.71 lladdr 74:86:e2:43:33:05 extern_learn NOARP proto zebra 
10.0.0.50 lladdr 18:5a:58:2a:e8:20 extern_learn NOARP proto zebra 
fe80::7686:e2ff:fe43:3305 lladdr 74:86:e2:43:33:05 extern_learn NOARP proto zebra 

And the type-2 routes look good:

# vtysh -c "show bgp l2vpn evpn"
BGP table version is 7, local router ID is 172.16.0.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.0.1:2
 *> [2]:[0]:[48]:[18:5a:58:2a:e8:20]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[18:5a:58:2a:e8:20]:[32]:[10.0.0.50]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[74:86:e2:43:33:05]:[32]:[10.0.0.71]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[74:86:e2:43:33:05]:[128]:[fe80::7686:e2ff:fe43:3305]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [3]:[0]:[32]:[172.16.0.1]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
Route Distinguisher: 172.16.0.2:2
 *> [2]:[0]:[48]:[74:86:e2:43:28:05]:[32]:[10.0.0.72]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002
 *> [2]:[0]:[48]:[74:86:e2:43:28:05]:[128]:[fe80::7686:e2ff:fe43:2805]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002
 *> [3]:[0]:[32]:[172.16.0.2]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002

Displayed 8 out of 8 total prefixes

Going over to the originating VTEP (sw1) where the host is directly connected, the NEIGH_TABLE is populated as expected:

# sonic-db-dump -n APPL_DB -y -k "NEIGH_TABLE:Vlan2:*"
{
  "NEIGH_TABLE:Vlan2:10.0.0.50": {
    "expireat": 1732144228.8517292,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "family": "IPv4",
      "neigh": "18:5a:58:2a:e8:20"
    }
  }
}

And we see these log entries.

2024 Nov 20 22:13:42.369623 sw1 NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 10.0.0.50, 18:5a:58:2a:e8:20 on Vlan2
2024 Nov 20 22:13:42.370310 sw1 NOTICE syncd#syncd: [none] SAI_API_NEXT_HOP:brcm_sai_create_next_hop:334 nhid 3 vr_id 0 ip af:v4 addr:10.0.0.50 rif-id 1 tunnel-id 0 vni 0
2024 Nov 20 22:13:42.370474 sw1 NOTICE syncd#syncd: [none] SAI_API_NEXT_HOP:_brcm_sai_xgs_create_ip_nexthop:554 nhid 3 eg-if 400004 rif 0 vid 0 port/tid(0x0) is_trunk(0)
2024 Nov 20 22:13:42.371069 sw1 NOTICE swss#orchagent: :- addNextHop: Created next hop 10.0.0.50 on Vlan2

I'm assuming there is some event that should cause population of the NEIGH_TABLE on sw2, which likely should also trigger off programming of the neighbor into the ASIC. Since this is not happening it is a violation of the HLD:
https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

Likely this is at least part of the underlying cause of the svi not being able to speak to neighbors across the vxlan.

@bradh352 bradh352 changed the title NEIGH_TABLE not populated with VXLAN routes leading to WARNING NEIGH_TABLE not populated with VXLAN routes Dec 2, 2024
bradh352 added a commit to bradh352/sonic-swss that referenced this issue Jan 21, 2025
VXLAN EVPN learned routes are not entered into NEIGH_TABLE as per
Issue sonic-net#3384.

The EVPN VXLAN HLD specifically states this should be populated so it triggers
an update to the SAI database:

https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

The reason it was not occurring is NOARP entries were being rejected, this
patch adds an exception for externally learned neighbors.

Signed-off-by: Brad House (@bradh352)
bradh352 added a commit to bradh352/sonic-swss that referenced this issue Jan 21, 2025
VXLAN EVPN learned routes are not entered into NEIGH_TABLE as per
Issue sonic-net#3384.

The EVPN VXLAN HLD specifically states this should be populated so it triggers
an update to the SAI database:

https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

The reason it was not occurring is NOARP entries were being rejected, this
patch adds an exception for externally learned neighbors.

Signed-off-by: Brad House (@bradh352)
bradh352 added a commit to bradh352/sonic-swss that referenced this issue Jan 22, 2025
VXLAN EVPN learned routes are not entered into NEIGH_TABLE as per
Issue sonic-net#3384.

The EVPN VXLAN HLD specifically states this should be populated so it triggers
an update to the SAI database:

https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

The reason it was not occurring is NOARP entries were being rejected, this
patch adds an exception for externally learned neighbors.

Signed-off-by: Brad House (@bradh352)
github-actions bot pushed a commit to bradh352/sonic-swss that referenced this issue Jan 22, 2025
VXLAN EVPN learned routes are not entered into NEIGH_TABLE as per
Issue sonic-net#3384.

The EVPN VXLAN HLD specifically states this should be populated so it triggers
an update to the SAI database:

https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

The reason it was not occurring is NOARP entries were being rejected, this
patch adds an exception for externally learned neighbors.

Signed-off-by: Brad House (@bradh352)
bradh352 added a commit to bradh352/sonic-swss that referenced this issue Jan 22, 2025
VXLAN EVPN learned routes are not entered into NEIGH_TABLE as per
Issue sonic-net#3384.

The EVPN VXLAN HLD specifically states this should be populated so it triggers
an update to the SAI database:

https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md#438-mac-ip-route-handling

The reason it was not occurring is NOARP entries were being rejected, this
patch adds an exception for externally learned neighbors.

Signed-off-by: Brad House (@bradh352)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant