Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchagent validates mirror session queue parameter against maximum value from SAI #1957

Merged
merged 3 commits into from
Oct 18, 2021

Conversation

raphaelt-nvidia
Copy link
Contributor

Signed-off-by: Raphael Tryster [email protected]

What I did

On initialization, orchagent queries SAI for the maximum value of the queue parameter in mirroring sessions. When the user configures a session, if the value requested by the user is invalid, it writes an error in the log and does not pass the request to SAI/SDK.

Why I did it

Fixes https://github.com/Azure/sonic-buildimage/issues/8189[mirroring] Missing per-vendor validation of mirror session queue parameter. Without this change, orchagent passes the invalid data to SAI/SDK which returns failure, causing orchagent to exit, effectively a system crash.

How I verified it

Manual and unit test: Create 3 mirror sessions with queue = 0, maximum valid value, lowest invalid value. First two cases, sessions are created. For last case, session not created in state DB and error in log.

Details if related

@@ -39,6 +39,11 @@
#define MIRROR_SESSION_DSCP_MIN 0
#define MIRROR_SESSION_DSCP_MAX 63

// 15 is a typical value, but if vendor's SAI does not supply the maximum value,
// allow all 8-bit numbers, effectively cancelling validation by orchagent.
#define MIRROR_SESSION_DEFAULT_NUM_TC 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 15 is the typical value, I would suggest having 15 as the default limit. By having 255 and if user supply a value like 100, this still can fail in SAI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure of some automatic tests has shown me that the question is more complex than that. Firstly, let me describe our internal discussions. My first inclination was to use 15 as a typical default, but our architect pointed out that if some vendor supports a higher number and does not implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, then a user configuring that higher number will be prevented from doing so. On the other hand, if we set the default high, then a smart user of that vendor's switch could go up to its maximum, and only a dumb user would enter a value that is too high and crash the switch. So I changed it to 255, which would allow all values up to 254. Of course, you may take the position that if a vendor wants to allow use of higher numbered queues, that vendor must implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES.

Now I would like to consult about another aspect. Changing to 255 exposed a bug in the test which I missed because internally I ran the wrong test version, but the failed tests in github show it. The vs test uses setReadOnlyAttr to simulate a value of 15 for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, expects passing 14 to succeed and 15 to fail. This worked as long as the default in orchagent was 15. But after I changed it to 255, creating a session with queue=15 succeeded, and the test failed. I believe the reason is that for efficiency, orchagent only queries SAI once for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, caches the value, and compares against the cached value whenever a session is created. You can see that code in this PR. But that probably means that when the test tries to set the limit with setReadOnlyAttr, it has no effect, because orchagent already completed its initialization, found no implementation of SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES in the vs environment, and set the limit to the default of 255. This theory is supported by additional tests I did today, where 254 succeeds and 255 fails. That means that the test cannot simulate its own limit, and must assume the same default as orchagent.

To sum up:

  1. In the light of the discussion above, do you still think I should change the default back to 15?
  2. Do you agree that I should get rid of setReadOnlyAttr in the test, and align with whatever default orchagent sets for testing the limit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Approving from my side. Lets wait for @bingwang-ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I gather you answer to #1 is No. What about #2? Unless I change something in the test, it will continue to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clarification in #1 makes sense to me. Regarding #2, I think updating the test code is better.

@prsunny prsunny requested a review from bingwang-ms October 13, 2021 21:44
prsunny
prsunny previously approved these changes Oct 14, 2021
@@ -39,6 +39,11 @@
#define MIRROR_SESSION_DSCP_MIN 0
#define MIRROR_SESSION_DSCP_MAX 63

// 15 is a typical value, but if vendor's SAI does not supply the maximum value,
// allow all 8-bit numbers, effectively cancelling validation by orchagent.
#define MIRROR_SESSION_DEFAULT_NUM_TC 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Approving from my side. Lets wait for @bingwang-ms

@prsunny prsunny merged commit 6b15584 into sonic-net:master Oct 18, 2021
volodymyrsamotiy pushed a commit to volodymyrsamotiy/sonic-swss that referenced this pull request Oct 20, 2021
…eter against maximum value from SAI (sonic-net#1957)

Signed-off-by: Raphael Tryster <[email protected]>
judyjoseph pushed a commit that referenced this pull request Oct 21, 2021
…lue from SAI (#1957)

* Orchagent validates mirror session queue parameter against maximum value from SAI
Signed-off-by: Raphael Tryster <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mirroring] Missing per-vendor validation of mirror session queue parameter
5 participants