Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi ASIC HLD #644

Merged
merged 8 commits into from
Dec 15, 2020
Merged

Multi ASIC HLD #644

merged 8 commits into from
Dec 15, 2020

Conversation

arlakshm
Copy link
Contributor

@arlakshm arlakshm commented Jul 1, 2020

No description provided.

arlakshm and others added 5 commits June 26, 2020 12:46
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
Signed-off-by: arlakshm <[email protected]>
Signed-off-by: arlakshm <[email protected]>
Signed-off-by: arlakshm <[email protected]>
192.168.199.193 proto 186 src 172.16.132.64 metric 20
nexthop via 10.0.107.12 dev PortChannel4007 weight 1
```
- A packet with destination ip as 192.168.199.193
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused here, could you review this part and make the name/ip/asicX correct? Looks the description and the picture are not matching.


#### 2.3.2.1. BGP on backend ASICs
In this approach:
- BGP instance is running on all the ASICs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't this proposal consume more CPU and memory resources?

```
(( N * (N-1))/2) * M
```
#### 2.3.2.3. Comparison of both approaches

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which proposal does this design consider to implemement? or do you provide both?

This creates a separate linux network stack including routes and network devices for every ASIC.
The interfaces for a given ASIC is linked to its corresponding namespace.

In a multi-ASIC system, very commonly the ASICs are physically connected in a clos fabric topology. With sonic container dockers running as separate namespaces for each ASIC in a multi-ASIC system, we can model and configure the system as if there is a spine-leaf network topology within the box.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to configure multi-asic ? does it have a CLI? how does it configure different ASICs

This creates a separate linux network stack including routes and network devices for every ASIC.
The interfaces for a given ASIC is linked to its corresponding namespace.

In a multi-ASIC system, very commonly the ASICs are physically connected in a clos fabric topology. With sonic container dockers running as separate namespaces for each ASIC in a multi-ASIC system, we can model and configure the system as if there is a spine-leaf network topology within the box.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-ASIC ?? Does it support multi vendor integration ? ex: MLNX & BRCM etc.

- Introduced ASIC element type for device internal chip
- There is an internal view from the minigraph file. It gives chipset, internal connections and internal routing logic setup for SONiC multi-asic devices.

In a Multi ASIC platform, configuration is generated per ASIC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this design handle warm-reboot?

- Backend ASIC, which have only internal links. These ASICs only connect to other Frontend ASICs. In a chassis system, the fabric ASIC is considered as a Backend ASIC.

### 2.3.2. Control Plane
Within a Multi ASIC SONiC system, internal control plane can be setup in following ways

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any scaling numbers ? do you see any perfromance (latency & memory foot print) issues?

Copy link
Collaborator

@kannankvs kannankvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the requirement/design for "downward compatibility" for single ASIC platforms explained? For example, if there is a platform with single ASIC with single "config_db.json" (and frr.conf) running previous version of SONiC and when it is upgraded with the new version that supports multi-asic, can we assume that no changes are required for such platforms? Same thing applies for all config and show commands that might have been heavily used in automation.


#### 2.4.3.1. Minigraph

minigraph.xml is used to generate the initial configuration of a multi asic platform. There is a single minigraph.xml consisting of internal ASIC connectivity view, where each ASIC is modelled as a device.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is minigraph mandatory to configure multi-asic?

- The database container running in the linux host will be called _globalDB_ container in a multi asic platform. The database tables populated here will be system wide attributes like SYSLOG, AAA, TACACS, Mgmt interface and resources like fan, psu, thermal.
- The ***database{n}*** container will be started in each namespace. _n_ here denotes the ASIC_ID, These databases will be used and updated by applications in that namespace.

Each database container will have it's own **"/var/run/redis{n}/"** directory which contains the database_config.json file, redis unix socket etc. The database_config.json will no longer be a static build time file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SONiC point of view, why don't use single database container to configure multi-asic ? what is the rational behind several database containers to configure multi-asic system?

}

```
**Sample database_config.json for ASIC0**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the design consider multi db support? is this template consider multi db instances ?

<neighbor_ip> applies to either internal/external BGP sessions as per the user input.
* config interface : Added an optional argument to specify the namespace [ -n namespace ]. In Multi-ASIC devices the namespace could either be taken as a user input or if not provided will be derived based on the interface name.
* config vlan : Added an optional argument to specify the namespace [ -n namespace ]. In Multi-ASIC devices namespace parameter is mandatory for (add/del) of vlan and member interface.
* config portchannel : Added an optional argument to specify the namespace [ -n namespace ]. In Multi-ASIC devices namesapce parameter is mandatory for (add/del) of portchannel and member interface.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does all members of a portchannel need to be from the same ASIC? i.e. can a port-channel have member ports from ASIC0 and ASIC1?
Please document it.

* config bgp : No additional namespace argument added to bgp commands. The bgp startup/shutdown all commands are applied on the external bgp sessions ( where BGP neighbors are external routers ). The commands like bgp <startup/shutdown/remove>
<neighbor_ip> applies to either internal/external BGP sessions as per the user input.
* config interface : Added an optional argument to specify the namespace [ -n namespace ]. In Multi-ASIC devices the namespace could either be taken as a user input or if not provided will be derived based on the interface name.
* config vlan : Added an optional argument to specify the namespace [ -n namespace ]. In Multi-ASIC devices namespace parameter is mandatory for (add/del) of vlan and member interface.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the VLAN member ports of a given VLAN be from different ASICs?

If yes, then, say we need to do an ARP request on the VLAN interface-- how will this be achieved?

Ext representing internal or external port respectively.

Signed-off-by: SuvarnaMeenakshi <[email protected]>
wangxin pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Oct 30, 2020
PR sonic-net/SONiC#644 introduced the HLD to support multi ASIC. In the future, multi DUT or Chassis will be supported by SONiC as well. The test infrastructure and some of the customized ansible modules need to be updated to support testing of the upcoming new architectures. This PR is implementation of PR 2347 which tried to propose how to improve the current test infrastructure to support multi-DUT and multi-ASIC systems. The target is to ensure that the existing test scripts are not broken and we can update the tests in incremental way.

This change is the implementation of PR 2347 - Add proposal for multi-DUT and multi-ASIC testing support
- Added the classes described in the PR:
  - SonicAsic - represents an asic, and implements the asic/namespace related operations to hide the complexity of handling the asic/namespace specific details.
      - For now, have added bgp_facts as an example to add 'instance_id' to the bgp_facts module call on a SonicHost.
  - MutliAsicSonicHost - a host with one or more SonicAsics.
  - DutHosts - represents all the DUT's in a testbed.
      - has 'nodes' list to represent each DUT in the testbed.

- Update duthosts fixture to return an instance of DutHosts instead of a list of SonicHosts
- Modify duthost fixture to return a MultiAsicSonicHost from duthosts.nodes
@rlhui rlhui merged commit 2f89785 into sonic-net:master Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants