Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reclaim buffer] Reclaim unused buffer for dynamic buffer model #1910

Merged
merged 60 commits into from
Nov 29, 2021

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Sep 15, 2021

What I did

Reclaim reserved buffer of unused ports for both dynamic and traditional models.
This is done by

  • Removing lossless priority groups on unused ports.
  • Applying zero buffer profiles on the buffer objects of unused ports.
  • In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to APPL_DB if there are admin down ports.
    The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports.
  • In the static buffer model, the zero profiles are loaded by the buffer template.

Signed-off-by: Stephen Sun [email protected]

Why I did it

How I verified it

Regression test and vs test.

Details if related
Static buffer model

Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port.

Dynamic buffer model

Handle zero buffer pools and profiles

  1. buffermgrd: add a CLI option to load the JSON file for zero profiles. (done in PR [Reclaiming buffer] Common code update #1996)
  2. Load them from JSON file into the internal buffer manager's data structure (done in PR [Reclaiming buffer] Common code update #1996)
  3. Apply them to APPL_DB once there is at least one admin-down port
    • Record zero profiles' names in the pool object it references.
      By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side.
    • And then apply the zero profiles to the buffer objects of the port.
    • Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced.
      Remove buffer pool counter id when the zero pool is removed.
  4. Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed.

Handle port admin status change

  1. Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST, and BUFFER_PORT_EGRESS_PROFILE_LIST.
    • When the port is admin down,
      • The normal profiles are removed from the buffer objects of the port
      • The zero profiles, if provided, are applied to the port
    • When the port is admin up,
      • The zero profiles, if applied, are removed from the port
      • The normal profiles are applied to the port.
  2. Ports orchagent exposes the number of queues and priority groups to STATE_DB.
    Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports.
    In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, ids_to_reclaim can be customized in the JSON file.
  3. Handle all buffer tables, including BUFFER_PG, BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST and BUFFER_PORT_EGRESS_PROFILE_LIST
    • Originally, only the BUFFER_PG table was cached in the dynamic buffer manager.
    • Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up.
    • The index of such tables can include a single port or a list of ports, like BUFFER_PG|Ethernet0|3-4 or BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4. Originally, there is a logic to handle such indexes for the BUFFER_PG table. Now it is reused and extended to handle all the tables.
  4. [Mellanox] Plugin to calculate buffer pool size:
    • Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports.
    • Now, they are reserved for admin-up ports only.

Accelerate the progress of applying buffer tables to APPL_DB

This is an optimization on top of reclaiming buffer.

  1. Don't apply buffer profiles, buffer objects to APPL_DB before buffer pools are applied when the system is starting.
    This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items.
    However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message.
  2. [Mellanox] Plugin to calculate buffer pool size:
    Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start.
    This is to accelerate the progress of pushing tables to APPL_DB.

- Port buffer poold
- Management PG
- Headroom for mirror

Signed-off-by: Stephen Sun <[email protected]>
- Remove queue, profile list when the port was admin down and is up
- Tolerance the case that the profile list is not fully handled by orchagent
  when port admin status is changed

Signed-off-by: Stephen Sun <[email protected]>
- If there is vendor specific IDs for queues or PGs to reclaim,
  take it.
- Otherwise, reclaim all queues or PGs supported by vendor

Signed-off-by: Stephen Sun <[email protected]>
… buffer manager

There are dependencies among buffer tables:
- BUFFER_POOL is referenced by BUFFER_PROFILE
- BUFFER_PROFILE is referenced by all rest buffer tables

Originally, dependencies were not handled by the retry logic.
However, when the system is starting, some retries incur error message and consume a lot of time.

We have taken the following steps to handle the dependencies and avoid retry

1. Lua plugin will return the value in APPL_DB or "0" as pool sizes in case it can not calculate it
2. Introduce a buffer pool ready flag. At the beginning, it's false and will be true once all buffer pools are calculated.
   Buffer profiles and objects (PGs, queues, profile litst) will be pushed to APPL_DB only if the buffer pools are ready.
3. Buffer pool ready flag is set typically at the first time the buffer pools' sizes are calculated.
   At the same time all pending buffer profiles and objects will be pushed to APPL_DB

By doing so, when APPL_DB receives buffer objects, all pools, profiles should be ready and there will no retry.

Signed-off-by: Stephen Sun <[email protected]>
Originally, the egress profiles can be ignored as we focused on ingress side.
Now a buffer profile will be in ignored list iff it is in m_bufferProfileLookup,
we don't need it anymore

Signed-off-by: Stephen Sun <[email protected]>
1. Unload zero profile, the zero profile name recorded in buffer pools are not cleared
   Cause: the pool names of zero profiles are not recorded
2. Unable to remove zero queues after a port is started up
   Cause: the zero queues are removed only if m_queueIdsToZero is not empty.
   However, if this is not specified on a vendor, the zero queues are applied to all queues
   and in this case the zero queues are not cleared

Signed-off-by: Stephen Sun <[email protected]>
- Apply zero profile to all configured items
- Apply zero profile to all supported-but-not-configured items

Signed-off-by: Stephen Sun <[email protected]>
- Apply zero profiles to supported but not configured items (PG, queue)
  when port is admin down
- Remove zero profiles from such items when port is admin up

Signed-off-by: Stephen Sun <[email protected]>
- Don't remove lossless PG if the profile doesn't align with port's speed and cable length

Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
- Add some comments
- Change name of some variables
- handle return value when reclaim reserved buffer for a port.
  it can fail in case the maximum number of queues and PGs are not in STATE_DB

Signed-off-by: Stephen Sun <[email protected]>
All callers will pass profile reference

Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
In case ingress_lossless_profile doesn't exist,
an empty profile will be created and propagated to APPL_DB,
which causes orchagent to die.

Signed-off-by: Stephen Sun <[email protected]>
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs stephenxs requested a review from neethajohn November 26, 2021 15:34
@neethajohn neethajohn merged commit fd887bf into sonic-net:master Nov 29, 2021
neethajohn pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Nov 29, 2021
…les (#8768)

Signed-off-by: Stephen Sun [email protected]

Why I did it
Support zero buffer profiles

Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file
Dependency: It depends on sonic-net/sonic-swss#1910 and submodule advancing PR once the former merged.

How I did it
Add buffer profiles and pool definition for zero buffer profiles

If the buffer model is static:
Apply normal buffer profiles to admin-up ports
Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
Apply normal buffer profiles to all ports
buffer manager will take care when a port is shut down
Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists

Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:

The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:

One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files

Update sonic-cfggen test accordingly:

Adjust example output file of JSON template for unit test
Add unit test in for Mellanox's new buffer templates.

How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
@stephenxs stephenxs deleted the reclaim-buffer branch November 29, 2021 21:43
bocon13 pushed a commit to pins/sonic-swss-public that referenced this pull request Nov 30, 2021
…c-net#1910)

Signed-off-by: Stephen Sun [email protected]

What I did

Reclaim reserved buffer of unused ports for both dynamic and traditional models.
This is done by

Removing lossless priority groups on unused ports.
Applying zero buffer profiles on the buffer objects of unused ports.
In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to APPL_DB if there are admin down ports.
The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports.
In the static buffer model, the zero profiles are loaded by the buffer template.

Why I did it

How I verified it

Regression test and vs test.

Details if related
Static buffer model

Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port.

Dynamic buffer model

Handle zero buffer pools and profiles

buffermgrd: add a CLI option to load the JSON file for zero profiles. (done in PR [Reclaiming buffer] Common code update sonic-net#1996)
Load them from JSON file into the internal buffer manager's data structure (done in PR [Reclaiming buffer] Common code update sonic-net#1996)
Apply them to APPL_DB once there is at least one admin-down port
Record zero profiles' names in the pool object it references.
By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side.
And then apply the zero profiles to the buffer objects of the port.
Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced.
Remove buffer pool counter id when the zero pool is removed.
Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed.
Handle port admin status change

Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST, and BUFFER_PORT_EGRESS_PROFILE_LIST.
When the port is admin down,
The normal profiles are removed from the buffer objects of the port
The zero profiles, if provided, are applied to the port
When the port is admin up,
The zero profiles, if applied, are removed from the port
The normal profiles are applied to the port.
Ports orchagent exposes the number of queues and priority groups to STATE_DB.
Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports.
In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, ids_to_reclaim can be customized in the JSON file.
Handle all buffer tables, including BUFFER_PG, BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST and BUFFER_PORT_EGRESS_PROFILE_LIST
Originally, only the BUFFER_PG table was cached in the dynamic buffer manager.
Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up.
The index of such tables can include a single port or a list of ports, like BUFFER_PG|Ethernet0|3-4 or BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4. Originally, there is a logic to handle such indexes for the BUFFER_PG table. Now it is reused and extended to handle all the tables.
[Mellanox] Plugin to calculate buffer pool size:
Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports.
Now, they are reserved for admin-up ports only.
Accelerate the progress of applying buffer tables to APPL_DB

This is an optimization on top of reclaiming buffer.

Don't apply buffer profiles, buffer objects to APPL_DB before buffer pools are applied when the system is starting.
This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items.
However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message.
[Mellanox] Plugin to calculate buffer pool size:
Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start.
This is to accelerate the progress of pushing tables to APPL_DB.
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 28, 2021
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun [email protected]
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun [email protected]
judyjoseph pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun [email protected]
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
…c-net#1910)

Signed-off-by: Stephen Sun [email protected]

What I did

Reclaim reserved buffer of unused ports for both dynamic and traditional models.
This is done by

Removing lossless priority groups on unused ports.
Applying zero buffer profiles on the buffer objects of unused ports.
In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to APPL_DB if there are admin down ports.
The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports.
In the static buffer model, the zero profiles are loaded by the buffer template.

Why I did it

How I verified it

Regression test and vs test.

Details if related
Static buffer model

Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port.

Dynamic buffer model

Handle zero buffer pools and profiles

buffermgrd: add a CLI option to load the JSON file for zero profiles. (done in PR [Reclaiming buffer] Common code update sonic-net#1996)
Load them from JSON file into the internal buffer manager's data structure (done in PR [Reclaiming buffer] Common code update sonic-net#1996)
Apply them to APPL_DB once there is at least one admin-down port
Record zero profiles' names in the pool object it references.
By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side.
And then apply the zero profiles to the buffer objects of the port.
Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced.
Remove buffer pool counter id when the zero pool is removed.
Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed.
Handle port admin status change

Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST, and BUFFER_PORT_EGRESS_PROFILE_LIST.
When the port is admin down,
The normal profiles are removed from the buffer objects of the port
The zero profiles, if provided, are applied to the port
When the port is admin up,
The zero profiles, if applied, are removed from the port
The normal profiles are applied to the port.
Ports orchagent exposes the number of queues and priority groups to STATE_DB.
Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports.
In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, ids_to_reclaim can be customized in the JSON file.
Handle all buffer tables, including BUFFER_PG, BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST and BUFFER_PORT_EGRESS_PROFILE_LIST
Originally, only the BUFFER_PG table was cached in the dynamic buffer manager.
Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up.
The index of such tables can include a single port or a list of ports, like BUFFER_PG|Ethernet0|3-4 or BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4. Originally, there is a logic to handle such indexes for the BUFFER_PG table. Now it is reused and extended to handle all the tables.
[Mellanox] Plugin to calculate buffer pool size:
Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports.
Now, they are reserved for admin-up ports only.
Accelerate the progress of applying buffer tables to APPL_DB

This is an optimization on top of reclaiming buffer.

Don't apply buffer profiles, buffer objects to APPL_DB before buffer pools are applied when the system is starting.
This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items.
However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message.
[Mellanox] Plugin to calculate buffer pool size:
Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start.
This is to accelerate the progress of pushing tables to APPL_DB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants