Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoneg] add support for remote speed advertisement and clarify the expected autoneg behaviors #924

Merged
merged 3 commits into from
May 10, 2022

Conversation

ds952811
Copy link
Contributor

@ds952811 ds952811 commented Jan 7, 2022

Why I did this?

  • Add support for the remote speed advertisement
  • Furtherly clarify the expected AutoNeg behaviors to unify SAI implementation

Related PRs:

Repo PR title State
sonic-sairedis vslib: add support for read-only port capabilities
sonic-swss portsorch: initial support for link-training and remote advertisement
sonic-swss-common selectabletimer: add mutex to start() and stop()
sonic-utilities [autoneg] add support for remote speed advertisement

Signed-off-by: Dante Su [email protected]

@ds952811 ds952811 changed the title [autoneg] add support for the operational states [autoneg] add support for the operational states and advanced controls per transceiver attached Jan 17, 2022
@liat-grozovik
Copy link
Collaborator

@Junchao-Mellanox, @keboliu please review this HLD and be the reviewer of the code once it is available.
Please check that the Twisted Pair model is going to work for any vendor. Same as the QSFP-DD additional support here.

4. If autoneg is enabled and adv_speeds is specified, SAI must advertise only the speeds those are supported by the switch silicon. Hence the operational advertisement could be a subset of the speeds specified in adv_speeds. SAI should return errors if none of desirable advertised speeds is valid.
5. If autoneg is enabled and adv_interface_types is not configured or empty, SAI must advertise it with all supported interface types.
6. If autoneg is enabled and adv_interface_types is specified, SAI must advertise only the interface types those are valid to the attched transceiver and supported by the switch silicon. Hence the operational advertisement could be a subset of the interface types specified in adv_interface_types. SAI should return errors if none of the desirable advertised interface types is valid.
7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should disable the autoneg.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate this requirement? I suppose user can set autoneg=off and set port speed to enforce the admin speed currently.

Copy link
Contributor Author

@ds952811 ds952811 Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be revised as per the earlier community meeting. The new one is as follow:

7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should not alter the autoneg config and operational states. The sonic-utilities should also block speed configuration when autoneg is enabled.
8. If autoneg is enabled, the interface type updates via SAI_PORT_ATTR_INTERFACE_TYPE should restart the autoneg. These are requests from pmon#xcvrd triggered by the transceiver insertion when the per-port interface type is specified in the platform-specific media_settings.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with point 8. It should have consistency with speed and FEC. If autoneg is enabled and interface type is set it should be blocked at CLI since interface is derived from auto negotiation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed please don't add any checks on FEC when AN is enabled for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, and it's done


| value | mode | Description |
|:-----:|:----:|:---------------------------------------------------------:|
| 2 | auto | Enable autoneg if applicable to the transceiver attached |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONiC is using a bool attribute SAI_PORT_ATTR_AUTO_NEG_MODE to set autoneg mode. Do you plan to use a new SAI attribute? What is the SAI attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, but the 'auto' mode will be removed as per the discussion in the earlier meeting

@@ -406,7 +447,7 @@ The port yang model needs to update according to DB schema change. The yang mode
```
leaf autoneg {
type string {
pattern "0|1";
pattern "0|1|2";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be on|off|auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, thanks

updatePortOperStatus(port, status)
if status == SAI_PORT_OPER_STATUS_UP:
updateDbPortOperSpeed(port, speed)
updateDbPortOperRemoteAdvSpeeds(port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the remote adv speeds when oper status is down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@@ -616,6 +685,10 @@ xcvrd just reads the configuration from media_settings.json and set value to APP

However, it is worthy mentioning that: if user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.

In the case of SFPs, the link training always takes place after speed negotiated, and given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced auto negotiation and link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB for this. swss#orchagent should request syncd to enable port auto negotiation only when **autoneg=1** or (**autoneg=2** and **AN is specified in xcvr_capabilities**).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is xcvr_capabilities from, media_settings.json or EEPROM? Some vendors do not use media_settings.json. And a few questions here:

  1. What if media_settings.json is not available?
  2. Do you plan to have a CLI to configure this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. media_settings.json is optional, when it's not specified the xcvrd will automatically derive the capabilities from transceiver information, the details will be added into HLD
  2. No CLI config is necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a knob to disable this flow (May be in media_settings.json). By default we shouldn't have xcvr capabilities published and have all the error logs, checks. It should be enabled per platform and I think the knob can be present in a platform specific file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier in the meeting, all the xcvr capability checkers are now removed

@@ -616,6 +685,10 @@ xcvrd just reads the configuration from media_settings.json and set value to APP

However, it is worthy mentioning that: if user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.

In the case of SFPs, the link training always takes place after speed negotiated, and given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced auto negotiation and link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB for this. swss#orchagent should request syncd to enable port auto negotiation only when **autoneg=1** or (**autoneg=2** and **AN is specified in xcvr_capabilities**).

In the case of SFPs, the valid speed list varies from transceivers to transceivers, and it's expected that user may specify a speed in the CONFIG_DB that's beyond the capabilities of the attached transceiver. Hence the **xcvr_speeds** is introduced into APPL_DB, and pmon#xcvrd should dynamically updates this field upon both transceiver insertion and dynamic port breakout operations. swss#orchagent should request syncd to update the port speed advertisement only when the speed is available in both **xcvr_speeds of APPL_DB** and **adv_speeds of CONFIG_DB**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cli config interface advertised-speeds, should it be extended to use "xcvr_speeds" to validate user input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

@@ -603,7 +672,7 @@ I choose this solution because:

Dynamic port breakout feature also introduces a hwsku.json file to describe the port capability. It defines the default dynamic breakout mode for now. As we won't automatically set auto negotiation attributes for a port after port breakout, it is not necessary to change the hwsku.json in this feature.

#### xcvrd Consideration
#### PMON xcvrd Consideration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example of media_settings.json for the new attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These capabilities are directly derived from the transceiver info, and yes, we could provide the custom overrides in the media_settings.json, will update the HLD

@@ -616,6 +685,10 @@ xcvrd just reads the configuration from media_settings.json and set value to APP

However, it is worthy mentioning that: if user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.

In the case of SFPs, the link training always takes place after speed negotiated, and given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced auto negotiation and link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB for this. swss#orchagent should request syncd to enable port auto negotiation only when **autoneg=1** or (**autoneg=2** and **AN is specified in xcvr_capabilities**).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list down the criteria used to classify transceiver as autoneg capable vs not? If there is a list please capture it in the HLD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, only CR transceivers will be flagged with 'AN,LT' support, and the details will be added into HLD.


| value | mode | Description |
|:-----:|:----:|:---------------------------------------------------------:|
| 2 | auto | Enable autoneg if applicable to the transceiver attached |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove auto mode as discussed in the HLD review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -616,6 +685,10 @@ xcvrd just reads the configuration from media_settings.json and set value to APP

However, it is worthy mentioning that: if user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.

In the case of SFPs, the link training always takes place after speed negotiated, and given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced auto negotiation and link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB for this. swss#orchagent should request syncd to enable port auto negotiation only when **autoneg=1** or (**autoneg=2** and **AN is specified in xcvr_capabilities**).

In the case of SFPs, the valid speed list varies from transceivers to transceivers, and it's expected that user may specify a speed in the CONFIG_DB that's beyond the capabilities of the attached transceiver. Hence the **xcvr_speeds** is introduced into APPL_DB, and pmon#xcvrd should dynamically updates this field upon both transceiver insertion and dynamic port breakout operations. swss#orchagent should request syncd to update the port speed advertisement only when the speed is available in both **xcvr_speeds of APPL_DB** and **adv_speeds of CONFIG_DB**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed if there is mismatch between xcvr_speeds and adv_speeds the ideal solution is to log error. Overriding the user configuration is not the right approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, and thanks

4. If autoneg is enabled and adv_speeds is specified, SAI must advertise only the speeds those are supported by the switch silicon. Hence the operational advertisement could be a subset of the speeds specified in adv_speeds. SAI should return errors if none of desirable advertised speeds is valid.
5. If autoneg is enabled and adv_interface_types is not configured or empty, SAI must advertise it with all supported interface types.
6. If autoneg is enabled and adv_interface_types is specified, SAI must advertise only the interface types those are valid to the attched transceiver and supported by the switch silicon. Hence the operational advertisement could be a subset of the interface types specified in adv_interface_types. SAI should return errors if none of the desirable advertised interface types is valid.
7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should disable the autoneg.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the review remove the point 7. It should be replaced by a check in sonic-utilities to block speed configuration when autoneg is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should also be done for other attributes that are exchanged via autoneg. Some of them I can name are FEC, interface type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, the HLD will be updated as follow

7. If autoneg is enabled, the administrative port speed updates (e.g. SAI_PORT_ATTR_SPEED) should not alter the autoneg config and operational states. The sonic-utilities should also block speed configuration when autoneg is enabled.
8. If autoneg is enabled, the administrative port FEC updates (e.g. SAI_PORT_ATTR_FEC_MODE, SAI_PORT_ATTR_FEC_MODE_EXTENDED) should not alter the autoneg config and operational states. The sonic-utilities should also block FEC configuration when autoneg is enabled.

7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should disable the autoneg.
8. If autoneg is enabled, the interface type updates via SAI_PORT_ATTR_INTERFACE_TYPE should be ignored. These are requests from xcvrd triggered by the transceiver insertion if the per-port interface type is specified in the platform-specific media_settings.json.
9. If autoneg is enabled on a SFP port, SAI should also activate the link-training to dynamically tune the TX FIR (i.e. Clause 73).
10. If autoneg is disabled, the port speed, interface type and TX FIR should be restored to the original values before autoneg activation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the original values before autoneg activation? What if user has not configured these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if autoneg is disabled then if speed/fec/if_type were configured they should be set, if not configured – default values should be set explicitly. This is to ensure the hardware is not programmed with residual values of the autoneg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, this will be updated as follow
11. If autoneg is disabled, the port speed, FEC, interface type and TX FIR should be restored with "speed", "fec", "interface_type" and ("preemphasis", "idriver", "ipredriver", "pre1", "pre2", "pre3", "main", "post1", "post2", "post3", "attn") values in the APPL_DB. If the corresponding configuration is not specified in the APPL_DB, vendor-specific SAI driver defaults should be applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"vendor-specific SAI driver defaults should be applied" - This should not be the case. It should be SAI API defaults and not any vendor specific defaults to have a uniformity of the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, the HLD is now updated, please check it out and see if works

- rmt_adv_speeds
Valid value is the same as **adv_speeds**, except that it's reported by the remote peer.
- xcvr_capabilities:
Transceiver capabilities provided by pmon#xcvrd, it's a set of capabilities separated by comma, where `LT` stands for "link training" and `AN` for "auto negotiation". For example: "AN,LT". See detail description in section [PMON xcvrd Consideration](#pmon-xcvrd-consideration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during HLD review when enabling AN for a transceiver that doesn't support capability we should throw error message during configuration and pass it only if force option is used. Please update the HLD to reflect the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to cover the scenario where pmon is not up or pmon comes up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and it's done

4. If autoneg is enabled and adv_speeds is specified, SAI must advertise only the speeds those are supported by the switch silicon. Hence the operational advertisement could be a subset of the speeds specified in adv_speeds. SAI should return errors if none of desirable advertised speeds is valid.
5. If autoneg is enabled and adv_interface_types is not configured or empty, SAI must advertise it with all supported interface types.
6. If autoneg is enabled and adv_interface_types is specified, SAI must advertise only the interface types those are valid to the attched transceiver and supported by the switch silicon. Hence the operational advertisement could be a subset of the interface types specified in adv_interface_types. SAI should return errors if none of the desirable advertised interface types is valid.
7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should disable the autoneg.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should also be done for other attributes that are exchanged via autoneg. Some of them I can name are FEC, interface type.

7. If autoneg is enabled, the administrative port speed updates via SAI_PORT_ATTR_SPEED should disable the autoneg.
8. If autoneg is enabled, the interface type updates via SAI_PORT_ATTR_INTERFACE_TYPE should be ignored. These are requests from xcvrd triggered by the transceiver insertion if the per-port interface type is specified in the platform-specific media_settings.json.
9. If autoneg is enabled on a SFP port, SAI should also activate the link-training to dynamically tune the TX FIR (i.e. Clause 73).
10. If autoneg is disabled, the port speed, interface type and TX FIR should be restored to the original values before autoneg activation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if autoneg is disabled then if speed/fec/if_type were configured they should be set, if not configured – default values should be set explicitly. This is to ensure the hardware is not programmed with residual values of the autoneg

- rmt_adv_speeds
Valid value is the same as **adv_speeds**, except that it's reported by the remote peer.
- xcvr_capabilities:
Transceiver capabilities provided by pmon#xcvrd, it's a set of capabilities separated by comma, where `LT` stands for "link training" and `AN` for "auto negotiation". For example: "AN,LT". See detail description in section [PMON xcvrd Consideration](#pmon-xcvrd-consideration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to cover the scenario where pmon is not up or pmon comes up later.

@Junchao-Mellanox
Copy link
Contributor

Any plan to add test case to sonic-mgmt? There was a test case at https://github.com/Azure/sonic-mgmt/blob/master/tests/platform_tests/test_auto_negotiation.py

4. If autoneg is enabled and adv_interface_types is not configured or empty, SAI must advertise it with all supported interface types.
5. If autoneg is disabled and interface_type is not configured, SAI must use SAI_PORT_INTERFACE_TYPE_NONE.
4. If autoneg is enabled and adv_interface_types is not configured or empty, SA
I must advertise it with all supported interface types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think an unintended newline. Please correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

5. If autoneg is disabled and interface_type is not configured, SAI must use SAI_PORT_INTERFACE_TYPE_NONE.
4. If autoneg is enabled and adv_interface_types is not configured or empty, SA
I must advertise it with all supported interface types.
5. If autoneg is disabled and interface_type is not configured, SAI must use SA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. This unintended newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

8. If autoneg is enabled on a SFP/QSFP/QSFPDD port, SAI should also activate the link-training to dynamically tune the TX FIR.
9. If autoneg is transitioned from enabled to disabled, the port speed, FEC, interface type and TX FIR should be restored with the corresponding values in the APPL_DB. (e.g. "speed", "fec", "interface_type" and ("preemphasis", "idriver", "ipredriver", "pre1", "pre2", "pre3", "main", "post1", "post2", "post3", "attn") If the corresponding configuration is not available in the APPL_DB, the following SAI driver defaults should be applied to achieve the best backward compatibility.

|Lane Count | Default Speed | Default FEC | Default Medium | Default Interface |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am again concerned with the defaults set here. SAI interface type if not configured should be set to none. FEC should also be set to None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, The defaults are SAI_PORT_FEC_MODE_NONE and SAI_PORT_INTERFACE_TYPE_NONE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, The defaults are SAI_PORT_FEC_MODE_NONE and SAI_PORT_INTERFACE_TYPE_NONE

| 10G | 1 | None |
| 1G | 1 | None |

11. If autoneg is enabled, the administrative port FEC mode updates (i.e. "fec=rs|fc|none" in CONFIG_DB) should be able to alter the FEC mode of the autoneg advertisement, if the configured FEC is not supported by the hardware, it should fallback as follows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the intended behavior. What currently we have in SONiC is if FEC is configured along with autoneg enabled, it overrides the auto negotiated value but doesn't change the advertisement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is removed

ds952811 added a commit to ds952811/sonic-utilities that referenced this pull request Apr 1, 2022
Add support for remote speed advertisement, such that users could easily
identify the connection issues when autoneg is enabled.

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
…s per transceiver attached

- Why I did this?

1. Add support for the operational states of AutoNeg
2. Furtherly clarify the AutoNeg behaviors for the SAI implementation

Signed-off-by: Dante Su <[email protected]>
ds952811 added a commit to ds952811/sonic-swss that referenced this pull request Apr 1, 2022
…sement

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement
Add support for capability checker

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
ds952811 added a commit to ds952811/sonic-swss that referenced this pull request Apr 1, 2022
…sement

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement
Add support for capability checker

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
@ds952811 ds952811 changed the title [autoneg] add support for the operational states and advanced controls per transceiver attached [autoneg] add support for remote speed advertisement and clarify the expected autoneg behaviors Apr 1, 2022
@@ -169,14 +229,16 @@ Configuring advertised speeds takes effect only if auto negotiation is enabled.

```
Format:
config interface advertised-speeds <interface_name> <speed_list>
config interface advertised-speeds <interface_name> <speed_list> [-f]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate the need for -f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already removed, and it was for the xcvr capability checkers


; Defines information for port state
key = PORT_TABLE:port_name ; state of the port
; field = value
...
supported_speeds = STRING ; supported speed list
speed = STRING ; operational speed
speed = STRING ; operational speed
rmt_adv_speeds = STRING ; advertised speed list of the remote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remote advertised speed might impact output of existing CLI. This usecase was not discussed in the meetings. Do we need this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to not to show remote advertised speed in the CLI, it's only a hint to help users identify the cause of a link failure

Copy link
Contributor Author

@ds952811 ds952811 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for your reference, while it's unexpected, the link could get up for some time when the local unit is with AN=ON and remote is in fixed speed. For example, a 100M link could be up when local is with AN=OFF and remote is with AN=ON (It's most likely the 100M is somehow the default speed upon AN failure).
Actually, we did hit this issue a few days ago, and having the remote advertisement displayed in CLI could help users identify this issue, if the link is up when AN=ON, and the remote advertisement is NONE/EMPTY, it's this case.

e.g. The 100M link is somehow coming up unexpectedly.

DUT A(100M, AN=OFF) + DUT B (AN=ON, Adv.Speed=1000,100,10)

Unfortunately, although the link is up unexpectedly, the duplex mode is wrong, and caused traffic issues

i.e.
DUT A is at Full-Duplex, while DUT B is at Half-Duplex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox Can you please provide feedback on modifying existing CLI's impact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for your reference, the CLI output is updated as follows, the column is increated from 90 to 108

Original

  Interface    Auto-Neg Mode    Speed    Adv Speeds    Type    Adv Types    Oper    Admin
-----------  ---------------  -------  ------------  ------  -----------  ------  -------
  Ethernet0          enabled      25G       10G,50G     CR4      CR4,CR2    down       up
 Ethernet32         disabled      40G           all     N/A          all      up       up
Ethernet112              N/A      40G           N/A     N/A          N/A      up       up
Ethernet116              N/A      40G           N/A     N/A          N/A      up       up
Ethernet120              N/A      40G           N/A     N/A          N/A      up       up
Ethernet124              N/A      40G           N/A     N/A          N/A      up       up

Modified

  Interface    Auto-Neg Mode    Speed    Adv Speeds    Rmt Adv Speeds    Type    Adv Types    Oper    Admin
-----------  ---------------  -------  ------------  ----------------  ------  -----------  ------  -------
  Ethernet0          enabled      25G       10G,50G               40G     CR4      CR4,CR2    down       up
 Ethernet32         disabled      40G           all               N/A     N/A          all      up       up
Ethernet112              N/A      40G           N/A               N/A     N/A          N/A      up       up
Ethernet116              N/A      40G           N/A               N/A     N/A          N/A      up       up
Ethernet120              N/A      40G           N/A               N/A     N/A          N/A      up       up
Ethernet124              N/A      40G           N/A               N/A     N/A          N/A      up       up


However, it is worthy mentioning that: if user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.
If user have port attributes configured both in CONFIG_DB and media_settings.json, the value in media_settings.json will override the value in CONFIG_DB after rebooting, restarting pmon or re-insert cables. Base on that, if user choose to use media_settings.json, they probably should not use CLI or CONFIG_DB to avoid configuration loss after rebooting, restarting pmon or re-insert cables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it ensured that media_settings.json takes preference over CONFIG_DB? In fact the user configuration should take more priority over any auto generated values.

Copy link
Contributor Author

@ds952811 ds952811 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the checkers/blockers are there in the current implementation, it's all about the timing when the corresponding fields of APPL_DB are updated, the OA is unable to check the source, and media_setting framework is always directly sending out these requests into APPL_DB, and an improvement to media_settings framework should be discussed in a separated and dedicated PR against Media-based-Port-settings.md.

Hence this statement right here is to help users understand

  1. Interface type should be configured with either CONFIG_DB or media_settings, only one of the approaches should be activated at the same time. Otherwise, the conflict will take place and may lead to unwanted results based on the timing and operation sequences.

  2. While the pre-emphasis and interface type defined in media_setting is for the TX FIR parameters to SFP/QSFP/QSFPDD ports, it's not applicable to the native coppers. And the IEEE standards expect link-training to be activated followed by negotiation, hence link-training is part of the autoneg, and because the TX FIR will be dynamically updated at runtime, the static parameters in media_settings.json will never be taken into process when autoneg is activated. However, the OA did need to make sure the requested static TX FIR will be restored when AutoNeg transitioned from ON to OFF.

  3. In the case that Link-Training is conditionally disabled during the AutoNeg, as far as I know, it's not IEEE standard and is not supported by all the ASICs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update HLD accordingly. I believe if the user has both configuration as well information defined in media_settings.json it wouldn't be possible to predict which would take precedence as there is no logic giving one priority over other. Hence the user should be aware of not to configure interface type if it is already defined in media_settings.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll emphasize this unwanted behavior in the HLD

@@ -511,6 +538,52 @@ else if autoneg == false:
setInterfaceType(port, interface_type)
```

##### Getting Negotiated Results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add coding snippets to HLD. HLD is to be used by non programmers as well. For explaining the flows please use flow diagrams

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

@yxieca yxieca force-pushed the master branch 2 times, most recently from 8498931 to 8837dc2 Compare April 15, 2022 16:51
ds952811 added a commit to ds952811/sonic-swss that referenced this pull request Apr 21, 2022
…sement

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement
Add support for capability checker

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
ds952811 added a commit to ds952811/sonic-utilities that referenced this pull request Apr 22, 2022
Add support for remote speed advertisement, such that users could easily
identify the connection issues when autoneg is enabled.

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
@adyeung
Copy link
Collaborator

adyeung commented May 5, 2022

@zhangyanzhao @yxieca pls help merge

@@ -511,6 +538,14 @@ else if autoneg == false:
setInterfaceType(port, interface_type)
```

##### Getting Remote Advertisement

A new periodic timer task will be introduced into PortsOrch, it periodically loops through physical ports and update the per-port remote advertisement if autoneg is enabled and the link is down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why link being DOWN is a requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the link gets up, it's not necessary to keep polling the remote advertisement, as a link-down event is expected upon advertisement update and cable replacement.


1. When xcvrd start, it reads the media_setting.json and set pre-emphasis values to APPL_DB for each port
2. When a new cable is inserted, xcvrd uses the value in media_setting.json to set pre-emphasis value to APPL_DB for this port.
While it is possible to use CLI/CONFIG_DB for setting the port auto negotiation attributes, this feature is also available via [media_settings.json](https://github.com/Azure/SONiC/blob/master/doc/media-settings/Media-based-Port-settings.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean AN can be configured via media_settings.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can not. This is again emphasise that interface type should be specified by either CLI/CONFIG_DB or media_settings.json, not both.

@@ -86,6 +92,11 @@ Currently, SAI already defines a few port attributes to support port auto negoti
3. If autoneg is enabled and adv_speeds is not configured or empty, SAI must advertise it with all supported speeds.
4. If autoneg is enabled and adv_interface_types is not configured or empty, SAI must advertise it with all supported interface types.
5. If autoneg is disabled and interface_type is not configured, SAI must use SAI_PORT_INTERFACE_TYPE_NONE.
6. If autoneg is enabled, the administrative port speed updates should not disable the autoneg. The configured speed should be cached in swss#orchagent and gets replayed when autoneg is transitioned from enabled to disabled.
7. If autoneg is enabled, while the administrative interface type updates via CONFIG_DB should be blocked, the dynamic interface type updates from pmon#xcvrd via APPL_DB should be delivered to the SAI to update the autoneg advertisement. Please refer to [PMON xcvrd Consideration](#pmon-xcvrd-consideration) for details.
8. If autoneg is enabled on a SFP/QSFP/QSFPDD port, SAI should also activate the link-training to dynamically tune the TX FIR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought if AN is enabled in the HW, LT would follow automatically, why SW intervention is required?

Copy link
Contributor Author

@ds952811 ds952811 May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Broadcom switch ASIC, this is true, the LT is always enabled and can't be disabled if AN is enabled.
However, the LT and AN are different hardware component, hence it may or may not be applicable to other vendors.
e.g. If we have LT disabled in the autoneg, the speed, fec negotiation could be possible for the optics, although this is not a standard port mode in the IEEE.

ds952811 added a commit to ds952811/sonic-utilities that referenced this pull request May 10, 2022
Add support for remote speed advertisement, such that users could easily
identify the connection issues when autoneg is enabled.

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>
@prgeor prgeor merged commit bc7b409 into sonic-net:master May 10, 2022
prgeor pushed a commit to sonic-net/sonic-utilities that referenced this pull request May 10, 2022
* [autoneg] add support for remote speed advertisement

Add support for remote speed advertisement, such that users could easily
identify the connection issues when autoneg is enabled.

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>

* fix test failures in dump_state_test.py

Signed-off-by: Dante Su <[email protected]>

* address review comments

Signed-off-by: Dante Su <[email protected]>

* drop PORT_ADV_SPEEDS from state_db_port_status_get()

Signed-off-by: Dante Su <[email protected]>

* address review comments

Signed-off-by: Dante Su <[email protected]>
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
* [autoneg] add support for remote speed advertisement

Add support for remote speed advertisement, such that users could easily
identify the connection issues when autoneg is enabled.

HLD: sonic-net/SONiC#924

- What I did
Add support for remote speed advertisement

- How I did it
Implementation is done according to the AutoNeg HLD

Signed-off-by: Dante Su <[email protected]>

* fix test failures in dump_state_test.py

Signed-off-by: Dante Su <[email protected]>

* address review comments

Signed-off-by: Dante Su <[email protected]>

* drop PORT_ADV_SPEEDS from state_db_port_status_get()

Signed-off-by: Dante Su <[email protected]>

* address review comments

Signed-off-by: Dante Su <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants