Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCU] Complete RDMA Platform Validation Checks #2791

Merged

Conversation

isabelmsft
Copy link
Contributor

@isabelmsft isabelmsft commented Apr 13, 2023

What I did

Add RDMA validator to ensure GCU is being used on allowable platform and version for RDMA use cases.
The allowable platforms and versions are determined by this table:
image

The last two rows were completed and infrastructure set up in the earlier PR #2619

How I did it

How to verify it

Attempt to use GCU on vs and modify PFC_WD table

Previous command output (if the output of a command-line utility has changed)

GCU patch application improperly goes through

New command output (if the output of a command-line utility has changed)

Failed to apply patch
Usage: config apply-patch [OPTIONS] PATCH_FILE_PATH
Try "config apply-patch -h" for help.

Error: Modification of /PFC_WD table is illegal due to corresponding validating function rdma_config_update_validator returning false

else:
return branch_version >= "20181100"
# PFCWD enable/disable scenario
if "pfc_wd" in field:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to build a dictionary of the fields and pre-populate it with all the restrictions and write a generic function to evaluate it.
something similar to the eg. below
field = {'pfcwd': {'all': '201811',
'cisco-8000': '202012'
}}
'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I added this to the gcu_field_operation_validators.conf.json file

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 26, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@isabelmsft isabelmsft force-pushed the rdma_platform_enforcement_complete branch from 0f90ed2 to 5031d8b Compare April 26, 2023 05:28
@isabelmsft isabelmsft requested a review from qiluo-msft April 26, 2023 21:06
match = re.search(r'\/([^\/]+)(\/|$)', path) # This matches the table name in the path, eg if path if /PFC_WD/GLOBAL, the match would be PFC_WD
if match is not None:
table = match.group(1)
table = match.group(1).lower()
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower()

Why change to lowercases? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it back to default text which would be upper case, and also changed the config file we are matching against to have table names in upper case

asic = "cisco-8000"
elif device_info.get_sonic_version_info()['asic_type'] == 'mellanox':
GET_HWSKU_CMD = "sonic-cfggen -d -v DEVICE_METADATA.localhost.hwsku"
spc1_hwskus = [ 'ACS-MSN2700', 'ACS-MSN2740', 'ACS-MSN2100', 'ACS-MSN2410', 'ACS-MSN2010', 'Mellanox-SN2700', 'Mellanox-SN2700-D48C8' ]
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spc1_hwskus

These string are hard-coded so not future proof. Better to move into gcu_field_operation_validators.conf.json. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to gcu_field_operation_validators.conf.json.

if hwsku.lower() in [spc1_hwsku.lower() for spc1_hwsku in spc1_hwskus]:
asic = "spc1"
elif device_info.get_sonic_version_info()['asic_type'] == 'broadcom':
command = ["sudo", "lspci"]
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command

Runtime check is heavy. We could build the mapping offline, and move add the hareware skus into gcu_field_operation_validators.conf.json.

Then in runtime, you just check hwsku. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved mapping to gcu_field_operation_validators.conf.json.

version_info = device_info.get_sonic_version_info()
build_version = version_info.get('build_version')
asic_type = version_info.get('asic_type')
asic = get_asic_name()
path = path.lower()
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower

Why change to lowercases? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, because all fields/paths other than the table in the conf file are lowercase. But in ConfigDB, some fields are uppercase (like AZURE_LOSSLESS)

if (asic_type != 'mellanox' and asic_type != 'broadcom' and asic_type != 'cisco-8000'):
# For paths like /BUFFER_PROFILE/pg_lossless_50000_300m_profile/xoff, remove pg_lossless_50000_300m from the path so that we can clearly determine which fields are modifiable
cleaned_path = "/".join([part for part in path.split("/") if not any(char.isdigit() for char in part)])
if asic == "unknown":
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

==

You can compare and return earlier. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved comparison to the start of validator function


if (asic_type != 'mellanox' and asic_type != 'broadcom' and asic_type != 'cisco-8000'):
# For paths like /BUFFER_PROFILE/pg_lossless_50000_300m_profile/xoff, remove pg_lossless_50000_300m from the path so that we can clearly determine which fields are modifiable
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determine

Not quite understand why removing help determine. #Closed

Copy link
Contributor Author

@isabelmsft isabelmsft May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed comment to possibly a clearer example: # For paths like /PFC_WD/Ethernet112/action, remove Ethernet112 from the path so that we can clearly determine the relevant field (i.e. action, not Ethernet112)

The conf file only saves action because that's the relevant field. Any part of the path with an integer is not relevant, so I don't save that in the conf file. Alternatively, I could change the conf file to define the regex pattern accepted for paths that include integers.


if (asic_type != 'mellanox' and asic_type != 'broadcom' and asic_type != 'cisco-8000'):
# For paths like /BUFFER_PROFILE/pg_lossless_50000_300m_profile/xoff, remove pg_lossless_50000_300m from the path so that we can clearly determine which fields are modifiable
cleaned_path = "/".join([part for part in path.split("/") if not any(char.isdigit() for char in part)])
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split

Manually split or join is dangerous, you do not consider escaping, etc. Suggest leverage some libraries, such as https://pypi.org/project/jsonpointer/. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use jsonpointer library

else:
raise GenericConfigUpdaterError("GCU table modification validators config file not found")

match = re.search(r'\/([^\/]+)(\/|$)', cleaned_path) # This matches the table name in the path, eg if path if /PFC_WD/GLOBAL, the match would be PFC_WD
Copy link
Contributor

@qiluo-msft qiluo-msft Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search

Suggest use library instead of manually parsing. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use jsonpointer library

@isabelmsft isabelmsft requested a review from wen587 May 4, 2023 04:33
qiluo-msft
qiluo-msft previously approved these changes May 17, 2023
wen587
wen587 previously approved these changes May 18, 2023
qiluo-msft pushed a commit to sonic-net/sonic-mgmt that referenced this pull request May 19, 2023
### Description of PR
Summary:
There is a dependency between E2E test improvements PR: #8341 and sonic-utilities GCU RDMA feature PR: sonic-net/sonic-utilities#2791

This PR skips the RDMA tests that will fail when the platform GCU RDMA feature PR is merged. After the GCU RDMA feature PR is merged and backported to the appropriate branch, we will revert this PR to stop skipping the E2E tests, and merge the E2E test improvements PR so that the tests will pass once the feature changes take effect.
"td2": "20181100",
"th": "20181100",
"th2": "20181100",
"td3": "20181100",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realized there is an error in my doc. this should be 202012. td3 is supported only from that release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@isabelmsft isabelmsft dismissed stale reviews from wen587 and qiluo-msft via f595095 May 19, 2023 18:35
@isabelmsft isabelmsft marked this pull request as ready for review May 23, 2023 21:49
@isabelmsft isabelmsft merged commit f258e2a into sonic-net:master May 24, 2023
StormLiangMS added a commit that referenced this pull request May 30, 2023
StormLiangMS added a commit that referenced this pull request May 30, 2023
@StormLiangMS
Copy link
Contributor

@isabelmsft revert this PR since it failed the PR test and blocked the submodule advance of sonic-utilities, pls check the issue and re-submit.

StormLiangMS added a commit that referenced this pull request May 30, 2023
isabelmsft added a commit that referenced this pull request May 31, 2023
pdhruv-marvell pushed a commit to pdhruv-marvell/sonic-utilities that referenced this pull request Aug 23, 2023
pdhruv-marvell pushed a commit to pdhruv-marvell/sonic-utilities that referenced this pull request Aug 23, 2023
mrkcmo pushed a commit to Azarack/sonic-mgmt that referenced this pull request Oct 3, 2023
### Description of PR
Summary:
There is a dependency between E2E test improvements PR: sonic-net#8341 and sonic-utilities GCU RDMA feature PR: sonic-net/sonic-utilities#2791

This PR skips the RDMA tests that will fail when the platform GCU RDMA feature PR is merged. After the GCU RDMA feature PR is merged and backported to the appropriate branch, we will revert this PR to stop skipping the E2E tests, and merge the E2E test improvements PR so that the tests will pass once the feature changes take effect.
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Jan 25, 2024
### Description of PR
Summary:
There is a dependency between E2E test improvements PR: sonic-net#8341 and sonic-utilities GCU RDMA feature PR: sonic-net/sonic-utilities#2791

This PR skips the RDMA tests that will fail when the platform GCU RDMA feature PR is merged. After the GCU RDMA feature PR is merged and backported to the appropriate branch, we will revert this PR to stop skipping the E2E tests, and merge the E2E test improvements PR so that the tests will pass once the feature changes take effect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants