Skip to content

Commit 695f108

Browse files
nrockershousenGitHub Enterprise
authored and
GitHub Enterprise
committed
TECHPUBS-4588: Spell and style check fixes (#99)
* TECHPUBS-4588: minor fixes, spellcheck, and md linting * TECHPUBS-4588: fixed review comments
1 parent fdfce7c commit 695f108

27 files changed

+115
-65
lines changed

.github/config/markdown_style.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,11 @@ MD006: true
5656
# MD007/ul-indent - Unordered list indentation
5757
MD007:
5858
# Spaces for indent
59-
indent: 4
59+
indent: 2
6060
# Whether to indent the first level of the list
6161
start_indented: false
6262
# Spaces for first level indent (when start_indented is set)
63-
start_indent: 4
63+
start_indent: 2
6464

6565
# MD009/no-trailing-spaces - Trailing spaces
6666
MD009:

.spelling

+49-3
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
#
77
# General
88
#
9+
0x82
10+
0x83
11+
0x84
12+
0x85
13+
0x86
14+
0x87
915
1G
1016
100Gbps
1117
1-port
@@ -15,6 +21,7 @@
1521
200Gbps
1622
2-port
1723
802.1Q
24+
adminStatus
1825
AMA
1926
AMAs
2027
amdgpu
@@ -46,6 +53,7 @@ Basesystem
4653
BER
4754
BERs
4855
behavior
56+
behaviors
4957
benchmarking
5058
bisectional
5159
Bond0
@@ -75,6 +83,7 @@ COS Base
7583
CN
7684
CNs
7785
CPE
86+
CPTs
7887
Cray
7988
cray
8089
cray-diags-fabric
@@ -118,7 +127,13 @@ debugfs
118127
default.yml
119128
defragmented
120129
dgnettest
130+
diags
121131
diskless
132+
disk-ful
133+
DIS_HOST
134+
DIS_ETH
135+
DIS_SMB
136+
DIS_USB
122137
DITA-OT
123138
ditamap
124139
ditamaps
@@ -166,6 +181,7 @@ FM
166181
FMN
167182
GbE
168183
Gbps
184+
Gen4
169185
gc_thresh
170186
Git
171187
Gitea
@@ -183,6 +199,7 @@ heatsink
183199
highpriority
184200
hodagd
185201
honor
202+
honoring
186203
hostname
187204
hostnames
188205
HPCM
@@ -215,6 +232,7 @@ IPs
215232
IPv4
216233
ISO
217234
ISOs
235+
journald
218236
JSON
219237
jwt
220238
JWT
@@ -252,6 +270,11 @@ LNM
252270
Loadbalance
253271
loadbalance
254272
localtime
273+
LOG_DEBUG
274+
LogLevelMax
275+
LOG_NOTICE
276+
LOG_INFO
277+
LOG_WARN
255278
Loopback
256279
loopback
257280
low-noise-mode
@@ -288,6 +311,7 @@ munged*
288311
multisocket
289312
n00
290313
n01
314+
NACKs
291315
nameserver
292316
NAPI
293317
ncn-personalization
@@ -315,6 +339,8 @@ ncn-w004
315339
Neighbor
316340
neighbor
317341
neighbors
342+
neighboring
343+
Netlink
318344
nfsserv
319345
NID
320346
NIDs
@@ -327,6 +353,8 @@ node-identity
327353
nodename
328354
non-CFS
329355
non-VLAN
356+
nonprivileged
357+
Nonprivileged
330358
nodelist
331359
NUMA
332360
Nvidia
@@ -348,6 +376,7 @@ pdsh
348376
Perftest
349377
perftest
350378
perf
379+
PEs
351380
pfc_fifo_oflw
352381
PKTBUF_ERROR
353382
playbook
@@ -375,6 +404,8 @@ Redfish
375404
RedHat
376405
release-rpms
377406
repo
407+
requeue
408+
Requeue
378409
RHEL-based
379410
RoCE
380411
roce_perf_check_loopback
@@ -396,17 +427,21 @@ rxe
396427
rxe0
397428
rxe1
398429
rxtx
430+
SBL_ASYNC_ALERT_LINK_DOWN
399431
SBL_ASYNC_ALERT_SERDES_FW_CORRUPTION
432+
SCTs
400433
SD-DAEMON
401-
SD_NOTICE
402-
SD_INFO
403-
SD_DEBUG
434+
_SD-DAEMON_
435+
_SD_NOTICE_
436+
_SD_INFO_
437+
_SD_DEBUG_
404438
SDK
405439
sdk
406440
SEL
407441
SELinux
408442
SerDes
409443
serdes
444+
SEQUENCE_ERROR
410445
shadow
411446
SharePoint
412447
SHS
@@ -422,14 +457,18 @@ SLE
422457
SLES
423458
SLE15_sp4
424459
SLE15-SP4
460+
SLES15-SP4
425461
socklnd
426462
Soft-RoCE
427463
Standalone
428464
standalone
429465
StatsFS
466+
SPTs
430467
subcommand
431468
subcommands
469+
substep
432470
subrole
471+
SU-leader
433472
superblock
434473
Supercomputing
435474
SuSE
@@ -439,9 +478,11 @@ sysctl
439478
syslog
440479
systemctl
441480
systemd
481+
Taskset
442482
tcp
443483
tcp1
444484
tcs
485+
TCTs
445486
TEMPLATE_NAME
446487
tmpfs
447488
TLV
@@ -458,6 +499,7 @@ UANs
458499
uC
459500
UDEV
460501
udev
502+
uint128
461503
uncomment
462504
untagged_eth_pcp
463505
untar
@@ -479,7 +521,11 @@ wildcard
479521
wildcards
480522
WLM
481523
WLMs
524+
WP_NIC0
525+
WP_NIC1
482526
writable
527+
x100x
528+
x16
483529
x86
484530
x86_64
485531
xname

docs/portal/developer-portal/HPE_Slingshot_Host_Software_Troubleshooting_Guide.ditamap

+1-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@
6767
<topicref href="troubleshoot/compute/troubleshoot_bond0_not_up.md" format="mdita"/>
6868
</topicref>
6969
<topicref href="troubleshoot/cassini/HPE_Slingshot_200Gbps_NIC.md" format="mdita">
70-
<topicref href="troubleshoot/cassini/HPE_Slingshot_200Gpbs_NIC_troubleshooting.md"
70+
<topicref href="troubleshoot/cassini/HPE_Slingshot_200Gbps_NIC_troubleshooting.md"
7171
format="mdita">
7272
<topicref
7373
href="troubleshoot/cassini/check_and_fix_misconfigured_nonvlan_tagged_ethernet_pcp_settings.md"

docs/portal/developer-portal/install/Soft_RoCE_on_HPE_Slingshot_200Gbps_NICs.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Soft-RoCE on HPE Slingshot 200Gpbs NICs
1+
# Soft-RoCE on HPE Slingshot 200Gbps NICs
22

33
Remote direct memory access (RDMA) over Converged Ethernet (RoCE) is a network protocol that enables RDMA over an Ethernet network.
44
RoCE can be implemented both in the hardware and in the software. Soft-RoCE is the software implementation of the RDMA transport.

docs/portal/developer-portal/install/cxi_core_driver.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ GPU Direct RDMA allows a PCIe device (the HPE Slingshot 200GbE NIC in this case)
77
## Vendors supported
88

99
- AMD - ROCm library, amdgpu driver
10-
- Nvidia - Cuda library, nvidia driver
10+
- NVIDIA - Cuda library, NVIDIA driver
1111
- Intel - Level Zero library, dmabuf kernel interface
1212

1313
## Special considerations
@@ -16,7 +16,7 @@ ___NVIDIA driver___
1616

1717
The NVIDIA driver contains a feature called Persistent Memory. It does not release pinned pages when device memory is freed unless explicitly directed by the NIC driver or upon job completion.
1818

19-
A cxi-ss1 parameter `nv_p2p_persistent` is used to enable Persistent Memory. The default is enabled.
19+
A `cxi-ss1` parameter `nv_p2p_persistent` is used to enable Persistent Memory. The default is enabled.
2020

2121
The `nv_p2p_persistent` parameter can be disabled by setting it to 0 in the `modprobe cxi-ss1` command.
2222

docs/portal/developer-portal/install/install_or_upgrade_compute_nodes.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Install compute nodes
22

3-
Perform this procedure to install SHS on compute nodes. This procedure can be used for systems that use either Mellanox NICs or HPE Slingshot 200Gpbs NICs.
3+
Perform this procedure to install SHS on compute nodes. This procedure can be used for systems that use either Mellanox NICs or HPE Slingshot 200Gbps NICs.
44

55
The installation method will depend on what type of NIC is installed on the system.
66
Select one of the following procedures depending on the NIC in use:
@@ -39,7 +39,7 @@ NOTE: The upgrade process is nearly identical to the installation, and the proce
3939
4040
a. The RPMs should be copied or moved to a location accessible to one or more hosts where the RPMs will be installed. This can be a network file share, a physically backed location such as a disk drive on the host, or a remotely accessible location such as a web server that hosts the RPMs.
4141
42-
b. The host or host OS image should be modified to add a repository for the newly downloaded RPMs for the package manager used in the OS distribution. Select the RPMs from the distribution file for your environment (slingshot_compute_cos-2.4... for COS 2.4, slingshot_compute_sle15_sp4 for SLE15_sp4, and so on)
42+
b. The host or host OS image should be modified to add a repository for the newly downloaded RPMs for the package manager used in the OS distribution. Select the RPMs from the distribution file for your environment (`slingshot_compute_cos-2.4...` for COS 2.4, `slingshot_compute_sle15_sp4` for SLE15_sp4, and so on)
4343
For SLE 15, `zypper` is used as the package manager for the host. A Zypper repository should be added which provides the path to the RPMs are hosted. An example for this could be the following:
4444
4545
Assume that the RPMs were downloaded and added to a web server that is external to the host,

docs/portal/developer-portal/install/softroce_on_HPE_Slingshot_200Gbps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Configure Soft-RoCE on HPE Slingshot 200Gpbs NICs
1+
# Configure Soft-RoCE on HPE Slingshot 200Gbps NICs
22

33
## Prerequisites
44

docs/portal/developer-portal/operations/Cassini_Retry_Handler_cxi_rh.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ The retry handler identifies these scenarios as "Timeouts" or "NACKs" respective
1313
- **NACKs**: Indicates that the target NIC observed some issue with a packet it received.
1414
A lack of space to land the packet could result in various NACKs being sent back to the source (depending on which resource was lacking).
1515
The most common NACK that is typically seen is a SEQUENCE_ERROR NACK. This simply indicates that a packet with an incorrect sequence number arrived. This is not an unusual situation.
16-
A prior packet being lost (say sequence number X) will lead to subsequent packets (all with sequence numbers greater than X) getting a SEQUENCE_ERROR NACK in response.
16+
A prior packet being lost (say sequence number X) will lead to subsequent packets (all with sequence numbers greater than X) getting a SEQUENCE_ERROR NACK in response.

docs/portal/developer-portal/operations/cassini_retry_handler_logging.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ journalctl -u cxi_rh@cxi0
1313

1414
## Log levels
1515

16-
As of SHS 11.1, the RH primarily uses four log Levels. Some messages have been moved to different levels as compared to previous releases.
16+
As of the SHS 11.1 release, the RH primarily uses four log Levels. Some messages have been moved to different levels as compared to previous releases.
1717

1818
- **LOG_WARN**: Cancellation Related Messages, Config Messages, Connection Level Messages (SCTs, TCTs), Retrying a Connection.
1919
- **LOG_NOTICE**: SPT Timeouts, Retry Completes for Timed out Packets.

docs/portal/developer-portal/operations/compute_node_configuration.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The `sat bootprep` input file should contain sections similar to the following t
2626
For the examples below,
2727

2828
- Replace `<version>` with the version of SHS desired
29-
- Replace `<playbook>` with the SHS ansible playbook that should be used
29+
- Replace `<playbook>` with the SHS Ansible playbook that should be used
3030
- Replace `ims_require_dkms: true` with `ims_require_dkms: false` if pre-built kernel binaries should be used instead of DKMS kernel packages. NOTE: This setting only exists with CSM 1.5 and later deployments.
3131

3232
**Note:** `shs_mellanox_install.yml` should be used if the Mellanox NIC is installed. `shs_cassini_install.yml` should be used if the HPE Slingshot 200Gbps NIC is installed.

docs/portal/developer-portal/operations/configure_qos_shs.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Configure QoS for Slingshot Host Software (SHS)
22

3-
The cxi-driver includes multiple QoS profiles for SHS. This includes PCP to DSCP mappings and other settings that must match the Rosetta side configs, as well as which internal HPE Slingshot 200Gbps NIC resources are made available to each traffic class in a profile.
3+
The cxi-driver includes multiple QoS profiles for SHS. This includes PCP to DSCP mappings and other settings that must match the Rosetta side configurations, as well as which internal HPE Slingshot 200Gbps NIC resources are made available to each traffic class in a profile.
44

55
An admin will be able to choose from one of the profiles that is made available. See the following subsections for guidance on viewing and selecting QoS profiles on the host.
66

@@ -10,7 +10,7 @@ For general information on QoS outside the context of SHS, see "Configure Qualit
1010

1111
QoS profile names on the host match those on the switch. On the host there will be an integer value associated with each QoS Profile. This value is used to select the QoS Profile that the driver should load.
1212

13-
Starting in the Slingshot 2.2 release, the following profiles will be supported on the host:
13+
Starting in the HPE Slingshot 2.2 release, the following profiles will be supported on the host:
1414

1515
- 1 - HPC
1616
- 2 - LL_BE_BD_ET
@@ -34,7 +34,7 @@ parm: active_qos_profile:QoS Profile to load. Must match fabric QoS Pr
3434

3535
## Select QoS profile on the host
3636

37-
The `active_qos_profile` module parameter to the cxi-ss1 driver allows admins to choose a QoS profile. As with any module parameter, there are multiple ways for an admin to apply the change, such as the following:
37+
The `active_qos_profile` module parameter to the `cxi-ss1` driver allows admins to choose a QoS profile. As with any module parameter, there are multiple ways for an admin to apply the change, such as the following:
3838

3939
- Directly via `insmod`/`modprobe`
4040
- Kernel Command Line
@@ -49,7 +49,7 @@ For example, to load the LL_BE_BD_ET profile via `modprobe`:
4949
Important notes:
5050

5151
- All nodes _must_ use the same QoS Profile on a particular fabric. See "Configure Quality of Service (QoS)" in the _HPE Slingshot Installation Guide_ for the environment in use.
52-
- QoS Profile change cannot be done "live", as the cxi-ss1 driver must be reloaded. To change profiles, reboot nodes with the desired QoS profile specified.
52+
- QoS Profile change cannot be done "live", as the `cxi-ss1` driver must be reloaded. To change profiles, reboot nodes with the desired QoS profile specified.
5353

5454
## Query QoS information on the host
5555

@@ -100,7 +100,7 @@ The following error message on the host can be reported if the 200Gbps NIC and H
100100

101101
**Note:** The above errors, specifically `pfc_fifo_oflw` errors, can also occur if the Fabric Manager is not configured with 200Gbps NIC QoS settings.
102102

103-
The PCP to utilize for non-VLAN tagged Ethernet frames is defined in a QoS profile. The CXI Driver (cxi-ss1) defines a kernel module parameter, `untagged_eth_pcp`, to optionally change this value. The default value of -1 means the value defined in the QoS profile will be used.
103+
The PCP to utilize for non-VLAN tagged Ethernet frames is defined in a QoS profile. The CXI Driver (`cxi-ss1`) defines a kernel module parameter, `untagged_eth_pcp`, to optionally change this value. The default value of -1 means the value defined in the QoS profile will be used.
104104

105105
The following is an example of how to override the value defined in the profile via modprobe:
106106

docs/portal/developer-portal/operations/workflow_decisions.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ At this point, some workflow decisions must be made. These decisions depend on r
3434
ncn-m001# OLD_IMPORT_BRANCH_REF=refs/remotes/origin/cray/slingshot-host-software/${OLD_RELEASE}
3535
```
3636
37-
- Else if there are no `integration-*` branches, but there is an integration branch with no `-<RELEASE>` suffix, determine what the release integration was based on by running the `git log` command.This finds the newest commit in the output (the commit closest to the top), which contains a message similar to "Import of 'slingshot-host-software' product version `<OLD-RELEASE>`".
37+
- Else if there are no `integration-*` branches, but there is an integration branch with no `-<RELEASE>` suffix, determine what the release integration was based on by running the `git log` command. This finds the newest commit in the output (the commit closest to the top), which contains a message similar to "Import of 'slingshot-host-software' product version `<OLD-RELEASE>`".
3838
3939
```screen
4040
ncn-m001# git log --topo-order refs/remotes/origin/integration | less
@@ -110,11 +110,11 @@ Failure to define any of the three variables above may result in install, upgrad
110110

111111
If group variable files are used, then a file must be defined for each target node type. Three groups of nodes are supported:
112112

113-
| Node Type | Product | Target Kernel Distribution | Group Variable File Name |
114-
| ------------------ | ------- | ----------------------------------------------- | ----------------------------- |
115-
| Compute | COS | COS (see COS installation for target OS kernel) | Compute/default.yml |
116-
| User Access/Login | UAN | COS (see COS installation for target OS kernel) | Application/default.yml |
117-
| Non-compute Worker | CSM | CSM (see CSM installation for target OS kernel) | Management_Worker/default.yml |
113+
| Node Type | Product | Target Kernel Distribution | Group Variable File Name |
114+
|--------------------|---------|-------------------------------------------------|---------------------------------|
115+
| Compute | COS | COS (see COS installation for target OS kernel) | `Compute/default.yml` |
116+
| User Access/Login | UAN | COS (see COS installation for target OS kernel) | `Application/default.yml` |
117+
| Non-compute Worker | CSM | CSM (see CSM installation for target OS kernel) | `Management_Worker/default.yml` |
118118

119119
An example configuration for a Compute node (`ansible/group_vars/Compute/default.yml`) on HPE Cray EX System Software 1.5 using COS 2.4 and CSM 1.3 might be the following:
120120

Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# HPE Slingshot 200Gbps NIC troubleshooting

docs/portal/developer-portal/troubleshoot/cassini/HPE_Slingshot_200Gpbs_NIC_troubleshooting.md

-1
This file was deleted.

docs/portal/developer-portal/troubleshoot/cassini/RDMA_interface_troubleshooting.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ The availability of a CXI interface indicates several key signs of health. An in
3434
- The interface retry handler is running
3535
- A matching L2 interface is available
3636
- The L1 interface has a temporary, locally administered, unicast address assigned to it. This is presumed to be an AMA applied by the fabric manager.
37-
- The L1 link state is reported if verbosity is enabled. L1 link state reported by `fi_info` will match the state reported by the L2 device through the ip tool.
37+
- The L1 link state is reported if verbosity is enabled. L1 link state reported by `fi_info` will match the state reported by the L2 device through the `ip` tool.
3838

39-
All these checks together make `fi_inf0` an excellent first tool to use to check the general health of 200Gbps NIC RDMA interfaces.
39+
All these checks together make `fi_info` an excellent first tool to use to check the general health of 200Gbps NIC RDMA interfaces.
4040

4141
**fi_pingpong:**
4242

0 commit comments

Comments
 (0)