Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FDB] [OA] [Warm-reboot] Cannot remove vlan since FDB Flush per port per vlan doesn't clear FDB entries #10605

Closed
vivekrnv opened this issue Apr 19, 2022 · 3 comments · Fixed by sonic-net/sonic-swss#2254
Labels
NVIDIA Triaged this issue has been triaged

Comments

@vivekrnv
Copy link
Contributor

vivekrnv commented Apr 19, 2022

Description

Steps to reproduce the issue:

1. config vlan add 40
2. config vlan member add 40 Ethernet0
On the other side. 
3. sudo ip link add link enp4s0f0 name enp4s0f0.40 type vlan id 40
4. sudo ip link set enp4s0f0.40 up
5. Make sure a fdb entry is learnt. 
root@sonic:/home/admin# fdbshow
  No.    Vlan  MacAddress         Port       Type
-----  ------  -----------------  ---------  -------
    1      40  7C:FE:90:12:22:EC  Ethernet0  Dynamic
6. config vlan member del 40 Ethernet0
7. config vlan del 40

Describe the results you received:

1) Vlan delete couldn't be handled by OA

Apr 18 21:50:31.128148 sonic NOTICE swss#orchagent: :- removeVlan: VLAN Vlan40 still has assiciated FDB entries

2) Subsequently, this impacts Warm-reboot restartcheck.

Apr 18 10:57:41.244710 sonic NOTICE swss#orchagent: :- removeVlan: VLAN Vlan40 still has assiciated FDB entries
Apr 18 10:57:41.244797 sonic NOTICE swss#orchagent: :- warmRestartCheck:     VLAN_TABLE:Vlan40|DEL
Apr 18 10:57:41.244825 sonic NOTICE swss#orchagent: :- warmRestartCheck: Restart check result: NOT_READY
Apr 18 10:57:41.244865 sonic NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK failed, orchagent is not ready for warm restart with status NOT_READY

3) FDB Entry from State DB is not removed. This is also documented here: #7538

root@r-lionfish-16:/home/admin# dump state fdb all --db ASIC_DB --db STATE_DB
{
    "Vlan40:7c:fe:90:12:22:ec": {
        "ASIC_DB": {
            "keys": [],
            "tables_not_found": [
                "ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY"
            ]
        },
        "STATE_DB": {
            "keys": [
                {
                    "FDB_TABLE|Vlan40:7c:fe:90:12:22:ec": {
                        "port": "Ethernet0",
                        "type": "dynamic"
                    }
                }
            ],
            "tables_not_found": []
        }
    }
}

4) OA doesn't have the logic to consolidate remove
Internal cache of OA which saves FDB entries use FdbEntry struct which uses the exact mac address as the part of key. To my knowledge, there is no section of the code which handles the consolidate remove and clear the FDB entries accordingly.
https://github.com/Azure/sonic-swss/blob/master/orchagent/fdborch.h#L17

5) FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters when the flush recieved from Syncd is for per port and per vlan.
https://github.com/Azure/sonic-swss/blob/master/orchagent/fdborch.cpp#L571

6) Individual fdb_counters i.e. m_fdb_count for port and vlan are only decremented when there is a AGED notification and not a FLUSHED notification.
Mellanox SDK doesn't generate a AGED notification when the VLAN is removed, it only generates a FLUSHED notification.
As i understand, Broadcom's SDK does generate a AGED notif in this case and the SONiC code is written in accordance to this behavior. Can someone point me to the spec where this is documented that AGED is appropriate rather that FLUSHED in this scenario?
https://github.com/Azure/sonic-swss/blob/master/orchagent/fdborch.cpp#L434

7) SAI/SDK notifications/functionality seems to be the expected lines:
The notification received from the SAI is of consolidated type:
https://github.com/opencomputeproject/SAI/blob/master/inc/saifdb.h#L304

2022-04-18.21:49:22.707054|r|SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x27000000000c81
2022-04-18.21:49:22.709914|f|SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000|SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID=oid:0x3a000000000c80|SAI_FDB_FLUSH_ATTR_BV_ID=oid:0x26000000000c7d
2022-04-18.21:49:22.712346|F|SAI_STATUS_SUCCESS
2022-04-18.21:49:22.713039|s|SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a000000000c80|SAI_BRIDGE_PORT_ATTR_ADMIN_STATE=false
2022-04-18.21:49:22.715425|n|fdb_event|[{"fdb_entry":"{\"bvid\":\"oid:0x26000000000c7d\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x21000000000000\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x3a000000000c80"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]|
2022-04-18.21:49:22.717977|s|SAI_OBJECT_TYPE_HOSTIF:oid:0xd000000000a22|SAI_HOSTIF_ATTR_VLAN_TAG=SAI_HOSTIF_VLAN_TAG_STRIP
2022-04-18.21:49:22.718885|f|SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000|SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID=oid:0x3a000000000c80
2022-04-18.21:49:22.721875|F|SAI_STATUS_SUCCESS
2022-04-18.21:49:22.722751|r|SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a000000000c80
2022-04-18.21:49:22.723082|n|fdb_event|[{"fdb_entry":"{\"bvid\":\"oid:0x39000000000010\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x21000000000000\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x3a000000000c80"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]|

Some extra info collected when Eth0 was a part of Vlan40

{
    "Vlan40|Ethernet0": {
        "ASIC_DB": {
            "keys": [
                {
                    "ASIC_STATE:SAI_OBJECT_TYPE_VLAN_MEMBER:oid:0x27000000000c81": {
                        "SAI_VLAN_MEMBER_ATTR_BRIDGE_PORT_ID": "oid:0x3a000000000c80",
                        "SAI_VLAN_MEMBER_ATTR_VLAN_ID": "oid:0x26000000000c7d",
                        "SAI_VLAN_MEMBER_ATTR_VLAN_TAGGING_MODE": "SAI_VLAN_TAGGING_MODE_TAGGED"
                    }
                },
                {
                    "ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a000000000c80": {
                        "SAI_BRIDGE_PORT_ATTR_ADMIN_STATE": "true",
                        "SAI_BRIDGE_PORT_ATTR_FDB_LEARNING_MODE": "SAI_BRIDGE_PORT_FDB_LEARNING_MODE_HW",
                        "SAI_BRIDGE_PORT_ATTR_PORT_ID": "oid:0x10000000008d4",
                        "SAI_BRIDGE_PORT_ATTR_TYPE": "SAI_BRIDGE_PORT_TYPE_PORT"
                    }
                }
            ],
            "tables_not_found": [],
            "vidtorid": {
                "oid:0x27000000000c81": "oid:0x100280027",
                "oid:0x3a000000000c80": "oid:0x10000003a"
            }
        }
}

8) syslog:

Apr 18 21:49:22.710029 sonic INFO swss#orchagent: :- flushFDBEntries: Flushing FDB bridge_port_oid: 0x3a000000000c80, and bvid_oid:0x26000000000c7d.
Apr 18 21:49:22.710029 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
Apr 18 21:49:22.710029 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
Apr 18 21:49:22.710628 sonic INFO syncd#SDK: :- processSingleEvent: key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000 op: flush
Apr 18 21:49:22.710628 sonic NOTICE syncd#SDK: [SAI_FDB.NOTICE] mlnx_sai_fdb.c[1212]- mlnx_flush_fdb_entries: Flush fdb, #0 BRIDGE_PORT_ID=BRIDGE_PORT,(0:0),1,0000,0 #1 BV_ID=VLAN,(0:0),28,0000,0
Apr 18 21:49:22.712389 sonic NOTICE syncd#SDK: :- processFdbFlush: fdb flush succeeded, updating redis database
Apr 18 21:49:22.713760 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Apr 18 21:49:22.713760 sonic INFO syncd#SDK: :- enqueueNotification: fdb_event [{"fdb_entry":"{\"bvid\":\"oid:0x2800000026\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x100000021\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x10000003a"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.713760 sonic NOTICE syncd#SDK: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000c80, bvId = oid:0x26000000000c7d
Apr 18 21:49:22.713760 sonic NOTICE syncd#SDK: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*oid:0x26000000000c7d*, portStr oid:0x3a000000000c80
Apr 18 21:49:22.713760 sonic INFO swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: removing SAI_OBJECT_TYPE_FDB_ENTRY:{"bvid":"oid:0x26000000000c7d","mac":"7C:FE:90:12:22:EC","switch_id":"oid:0x21000000000000"}
Apr 18 21:49:22.713760 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000579 sec
Apr 18 21:49:22.713760 sonic INFO swss#orchagent: :- notifyObserversFDBFlush: Adding MAC learnt on [ port:Ethernet0 , bvid:0x26000000000c7d]                           to ARP flush
Apr 18 21:49:22.713760 sonic INFO swss#orchagent: :- processFDBFlushUpdate: processFDBFlushUpdate port: Ethernet0
Apr 18 21:49:22.713760 sonic INFO swss#orchagent: :- processFDBFlushUpdate: Flushing ARP for port: Vlan40, VLAN: Ethernet0
Apr 18 21:49:22.713924 sonic NOTICE syncd#SDK: :- handle_fdb_event: got fdb flush event: [{"fdb_entry":"{\"bvid\":\"oid:0x2800000026\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x100000021\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x10000003a"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.713962 sonic INFO syncd#SDK: :- process_on_fdb_event: fdb event count: 1
Apr 18 21:49:22.713962 sonic NOTICE syncd#SDK: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000c80, bvId = oid:0x26000000000c7d
Apr 18 21:49:22.713962 sonic NOTICE syncd#SDK: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*oid:0x26000000000c7d*, portStr oid:0x3a000000000c80
Apr 18 21:49:22.716496 sonic INFO syncd#SDK: :- sendNotification: fdb_event [{"fdb_entry":"{\"bvid\":\"oid:0x26000000000c7d\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x21000000000000\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x3a000000000c80"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.717567 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Apr 18 21:49:22.717950 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000430 sec

Apr 18 21:49:22.718949 sonic INFO swss#orchagent: :- flushFDBEntries: Flushing FDB bridge_port_oid: 0x3a000000000c80, and bvid_oid:0x0.
Apr 18 21:49:22.718949 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 1
Apr 18 21:49:22.718949 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 1
Apr 18 21:49:22.719409 sonic INFO syncd#SDK: :- processSingleEvent: key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000 op: flush
Apr 18 21:49:22.719409 sonic NOTICE syncd#SDK: [SAI_FDB.NOTICE] mlnx_sai_fdb.c[1212]- mlnx_flush_fdb_entries: Flush fdb, #0 BRIDGE_PORT_ID=BRIDGE_PORT,(0:0),1,0000,0
Apr 18 21:49:22.720897 sonic INFO syncd#SDK: :- enqueueNotification: fdb_event [{"fdb_entry":"{\"bvid\":\"oid:0x10010039\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x100000021\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x10000003a"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.721002 sonic NOTICE syncd#SDK: :- processFdbFlush: fdb flush succeeded, updating redis database
Apr 18 21:49:22.721002 sonic NOTICE syncd#SDK: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000c80, bvId = oid:0x0
Apr 18 21:49:22.721002 sonic NOTICE syncd#SDK: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*, portStr oid:0x3a000000000c80
Apr 18 21:49:22.722930 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Apr 18 21:49:22.722930 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000635 sec
Apr 18 21:49:22.722930 sonic INFO swss#orchagent: :- removeBridgePort: Flush FDB entries for port Ethernet0
Apr 18 21:49:22.722930 sonic NOTICE syncd#SDK: :- handle_fdb_event: got fdb flush event: [{"fdb_entry":"{\"bvid\":\"oid:0x10010039\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x100000021\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x10000003a"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.722930 sonic INFO syncd#SDK: :- process_on_fdb_event: fdb event count: 1
Apr 18 21:49:22.722930 sonic NOTICE syncd#SDK: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000c80, bvId = oid:0x39000000000010
Apr 18 21:49:22.722930 sonic NOTICE syncd#SDK: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*oid:0x39000000000010*, portStr oid:0x3a000000000c80
Apr 18 21:49:22.722930 sonic INFO syncd#SDK: :- sendNotification: fdb_event [{"fdb_entry":"{\"bvid\":\"oid:0x39000000000010\",\"mac\":\"00:00:00:00:00:00\",\"switch_id\":\"oid:0x21000000000000\"}","fdb_event":"SAI_FDB_EVENT_FLUSHED","list":[{"id":"SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID","value":"oid:0x3a000000000c80"},{"id":"SAI_FDB_ENTRY_ATTR_TYPE","value":"SAI_FDB_ENTRY_TYPE_DYNAMIC"},{"id":"SAI_FDB_ENTRY_ATTR_PACKET_ACTION","value":"SAI_PACKET_ACTION_FORWARD"}]}]
Apr 18 21:49:22.728991 sonic WARNING swss#orchagent: :- meta_fdb_event_snoop_oid: fdb_entry oid (snoop): SAI_OBJECT_TYPE_BRIDGE_PORT: oid:0x3a000000000c80
Apr 18 21:49:22.728991 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Apr 18 21:45:29.760159 sonic INFO swss#orchagent: :- update: FDB event:3, MAC: 00:00:00:00:00:00 , BVID: 0x26000000000c7d ,                    bridge port ID: 0x3a000000000c7e.
Apr 18 21:45:29.760159 sonic INFO swss#orchagent: :- update: Flush event: Failed to get port by bridge port ID 0x3a000000000c7e.
Apr 18 21:45:29.760159 sonic INFO swss#orchagent: :- update: FDB event:3, MAC: 00:00:00:00:00:00 , BVID: 0x39000000000010 ,                    bridge port ID: 0x3a000000000c7e.
Apr 18 21:45:29.760159 sonic INFO swss#orchagent: :- update: Flush event: Failed to get port by bridge port ID 0x3a000000000c7e.
Apr 18 21:49:22.729021 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000480 sec

Describe the results you expected:

  1. FDB clear flow should not impact removing Vlan and thus the warm-boot
@dgsudharsan
Copy link
Collaborator

@prsunny @dr412113 We may need to revert the check introduced in this PR sonic-net/sonic-swss#1716 to return failure to remove VLAN with FDB entries until the problems mentioned here are sorted out

@prsunny
Copy link
Contributor

prsunny commented Apr 20, 2022

@dgsudharsan , as discussed, please consider this to be handled in FLUSH event.

@zhangyanzhao zhangyanzhao added Triaged this issue has been triaged NVIDIA labels Apr 27, 2022
@zhangyanzhao
Copy link
Collaborator

Nvidia team is working on the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NVIDIA Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants