Skip to content

Commit 32d2dab

Browse files
authored
Merge pull request sonic-net#211 from BRCM-SONIC/pcs_link_monitor
PHY pcs link monitor feature
2 parents 77d6fc1 + 837697b commit 32d2dab

File tree

2 files changed

+295
-61
lines changed

2 files changed

+295
-61
lines changed

system/pcs-error-counter.HLD

-61
This file was deleted.

system/phy-link-monitor.HLD

+295
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
# Feature Name
2+
3+
PHY PCS/PMD status monitoring
4+
5+
# High Level Design Document
6+
7+
#### Rev 0.1
8+
9+
# Table of Contents
10+
11+
* [List of Tables](#list-of-tables)
12+
13+
* [Revision](#revision)
14+
15+
* [About This Manual](#about-this-manual)
16+
17+
* [Scope](#scope)
18+
19+
* [Definition/Abbreviation](#definitionabbreviation)
20+
21+
# List of Tables
22+
23+
[Table 1: Abbreviations](#table-1-abbreviations)
24+
25+
# Revision
26+
27+
| Rev | Date | Author | Change Description |
28+
| ----- | ---------- | ------------- | -------------------- |
29+
| 0.1 | 04/14/2021 | Steven Lu | Initial proposal
30+
| 0.2 | 05/21/2021 | Vishnu Shetty | Phase 1 and 2 design |
31+
32+
# About this Manual
33+
34+
This document provides details of PCS/PMD error status monitoring feature.
35+
36+
# Scope
37+
38+
This document captures feature requirement and high level design.
39+
40+
# Definition/Abbreviation
41+
42+
### Table 1: Abbreviations
43+
44+
| Term | Meaning |
45+
| ---- | ---- |
46+
| PHY | PHYsical Layer |
47+
| PCS | Physical Coding Sublayer |
48+
| PMD | Physical Medium Dependent |
49+
| PMA | Physical Medium Attachment |
50+
| BIP | Bit Interleave Parity |
51+
| BER | Bit Error Rate |
52+
| AM | Alignment Marker |
53+
| AMPS | Alignment Marker Payload Sequence |
54+
55+
# 1 Feature Overview
56+
57+
This feature provides user to monitor and debug serdes link qulity and link down reason.
58+
59+
## 1.1 Requirements
60+
61+
High Level Requirements:
62+
63+
Collect PCS error stats and status at regular interval. The interval should be by default 60 sec and configurable. The last non-zero count and "nok" status timestamp should be recorded along with last poll timestamp. The port should be in diagnostic mode to collect some set of PCS status.
64+
65+
Stats:
66+
BER count
67+
Errored block count
68+
BIP error count
69+
FEC correctable errors
70+
FEC uncorrectable error
71+
FEC symbol error
72+
73+
PMD status:
74+
PMD Signal Detect
75+
PMD CDR Lock
76+
77+
PCS status (phase-2, diag mode only):
78+
PCS Sync state
79+
PCS Link state
80+
PCS Local fault
81+
PCS Remote fault
82+
PCS AM lock
83+
PCS AMPS lock
84+
PCS Deskew state
85+
PCS HI_BER state
86+
87+
Phase-1 support: Falcon-16 (TH2) and Blackhawk-7 (TH3)
88+
Phase-2 support: Remaining SerDeses (Merlin, Eagle etc)
89+
90+
### 1.1.1 Functional (detail) Requirements
91+
92+
### 1.1.2 Scalability Requirements
93+
94+
### 1.1.3 Warm Boot Requirements
95+
96+
The STATE_DB and COUNTER_DB entries should be preserved across warm-reboot.
97+
98+
## 1.2 Design Overview
99+
100+
## 2.1 Target Deployment Use Cases
101+
102+
## 2.2 Functional Description
103+
104+
# 3 Design
105+
106+
## 3.1 Overview
107+
108+
The orchagent initialises PCS/PMD status and counter collection. The syncd container polls counter/status from SAI at configured interval and updates COUNTER_DB and STATE_DB. There is an extra poll during link state change. Non zero and "nok" status timestamp will be recorded for each object. The stat gives an indication of link quality.
109+
110+
There are some PCS/PMD objects which are listed above under phase-2 cant be fetched without stopping link scan thread (asic limitation). This feature is primarily required to debug link down reason. New command would be availble to put port under diagnostic mode. In diagnostic mode status will be polled and updated in state DB.
111+
112+
## 3.2 DB Changes
113+
114+
### 3.2.1 CONFIG DB
115+
116+
**PORT_TABLE:**
117+
118+
In phase 2:
119+
Field:
120+
"diag_mode"
121+
Value:
122+
"on"
123+
"off"
124+
125+
### 3.2.2 APP DB
126+
127+
**PORT_TABLE:**
128+
129+
In phase 2:
130+
Field:
131+
"diag_mode"
132+
Value:
133+
"on"
134+
"off"
135+
136+
### 3.2.3 STATE DB
137+
138+
**PORT_TABLE:**
139+
140+
Phase 1:
141+
142+
Field:
143+
PMD_SIGNAL_DETECT_STATUS/TIMESTAMP
144+
PMD_CDR_LOCK_STATUS/TIMESTAMP
145+
146+
Value:
147+
"ok"
148+
"nok"
149+
150+
Phase 2:
151+
152+
Field:
153+
PCS_SYNC_STATUS/TIMESTAMP
154+
PCS_LINK_STATUS/TIMESTAMP
155+
PCS_LOCAL_FAULT_STATUS/TIMESTAMP
156+
PCS_REMOTE_FAULT_STATUS/TIMESTAMP
157+
PCS_AM_LOCK_STATUS /TIMESTAMP
158+
PCS_AMPS_LOCK_STATUS/TIMESTAMP
159+
PCS_DESKEW_STATUS/TIMESTAMP
160+
PCS_HI_BER_STATUS/TIMESTAMP
161+
162+
Value:
163+
"ok"
164+
"nok"
165+
166+
### 3.2.4 ASIC DB
167+
168+
In phase 2:
169+
SAI_PORT_ATTR_DIAG_MODE
170+
"true"/"false"
171+
172+
### 3.2.5 COUNTER DB
173+
174+
SAI_PORT_STAT_IF_IN_BER_COUNT/TIMESTAMP
175+
SAI_PORT_STAT_IF_IN_ERROR_BLOCK_COUNT/TIMESTAMP
176+
SAI_PORT_STAT_IF_IN_BIP_ERROR_COUNT/TIMESTAMP
177+
SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES/TIMESTAMP
178+
SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES/TIMESTAMP
179+
SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS/TIMESTAMP
180+
181+
### 3.2.6 FLEX_COUNTER DB
182+
183+
FLEX_COUNTER_GROUP_TABLE:PORT_STAT_COUNTER
184+
"PCS_POLL_INTERVAL"
185+
"60"
186+
187+
## 3.3 Switch State Service Design
188+
189+
### 3.3.1 Orchestration Agent
190+
191+
Enables the flex counter thread to poll the counters and status.
192+
193+
## 3.4 SyncD
194+
195+
The periodically polls SAI to collect the counters and status. In phase 2, status will be collected only if port is in diag mode.
196+
The stats will be updated in COUNTERS_DB with last non-zero occurance. The PCS/PMD status will be updated in STATE_DB with last "nok" occurance time stamp.
197+
198+
## 3.5 SAI
199+
200+
The following attributes will be part of SAI extension.
201+
202+
SAI_PORT_ATTR_PMD_SIGNAL_DETECT_STATUS
203+
SAI_PORT_ATTR_PMD_CDR_LOCK_STATUS
204+
SAI_PORT_ATTR_PCS_SYNC_STATUS
205+
SAI_PORT_ATTR_PCS_LINK_STATUS
206+
SAI_PORT_ATTR_PCS_LOCAL_FAULT_STATUS
207+
SAI_PORT_ATTR_PCS_REMOTE_FAULT_STATUS
208+
SAI_PORT_ATTR_PCS_AM_LOCK_STATUS
209+
SAI_PORT_ATTR_PCS_AMPS_LOCK_STATUS
210+
SAI_PORT_ATTR_PCS_DESKEW_STATUS
211+
SAI_PORT_ATTR_PCS_HI_BER_STATUS
212+
213+
SAI_PORT_STAT_IF_IN_BER_COUNT
214+
SAI_PORT_STAT_IF_IN_ERROR_BLOCK_COUNT
215+
SAI_PORT_STAT_IF_IN_BIP_ERROR_COUNT
216+
217+
## 3.6 User Interface
218+
219+
### 3.6.1 Data Models
220+
221+
### 3.6.2 CLI
222+
223+
#### 3.6.2.1 Configuration Commands
224+
225+
**config interface Ethernet X diag_mode <enable/disable>**
226+
227+
```
228+
sonic-cli# configure terminal
229+
sonic-cli(config)# interface Ethernet 0
230+
sonic(conf-if-Ethernet0)# diag-mode enable
231+
sonic(conf-if-Ethernet0)# no diag-mode enable
232+
sonic(conf-if-Ethernet0)# diag-mode ?
233+
enable enable port diag mode
234+
```
235+
236+
PCS status polling gets enabled in port diag mode. In diag mode port oper status gets freezed, renables back after disable.
237+
238+
**config pcs polling-interval <0-999>**
239+
240+
```
241+
sonic-cli# configure terminal
242+
sonic-cli(config)# pcs polling-interval
243+
0-999 interval in sec, 0 disable
244+
```
245+
246+
By default polling interval is 60 sec. In port diag mode, to get close to real time status adjust the polling interval.
247+
Its suggested to use between 1 to 5 sec.
248+
249+
#### 3.6.2.2 Show Commands
250+
251+
**show interface pcs status**
252+
```
253+
show interface pcs status [EthernetX]
254+
Last update time : May-05-2021 16:49:12
255+
PMD signal detect : nok since May-05-2021 16:49:12
256+
PMD cdr lock : ok since May-04-2021 10:40:00
257+
```
258+
259+
The time stamp show last "nok" recorded status.
260+
261+
**show interface pcs counters**
262+
```
263+
show interface pcs counter [EthernetX]
264+
Last update time : May-27-2021 16:49:12
265+
BER : 1234 at May-05-2021 16:49:12
266+
Error Block : 1234 at May-04-2021 10:40:00
267+
BIP error : 1234 at May-04-2021 10:40:00
268+
FEC correctable : 1234 at May-05-2021 16:49:12
269+
FEC non correctable : 1234 at May-05-2021 16:49:12
270+
FEC symbol errors : 1234 at May-05-2021 16:49:12
271+
```
272+
273+
Show last non-zero value and time stamp. Future enahncement needed to count number of non-zero updates and cumulative counter.
274+
275+
#### 3.6.2.3 Debug Commands
276+
277+
#### 3.6.2.4 IS-CLI Compliance
278+
279+
### 3.6.3 REST API Support
280+
281+
### 3.6.4 Service and Docker Management
282+
283+
# 4 Flow Diagrams
284+
285+
# 5 Error Handling
286+
287+
# 6 Serviceability and Debug
288+
289+
# 7 Warm Boot Support
290+
291+
# 8 Scalability
292+
293+
# 9 Unit Test
294+
295+
# 10 Internal Design Information

0 commit comments

Comments
 (0)