You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: third-party/README.md
+28-70Lines changed: 28 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,74 +8,27 @@
8
8
9
9
## Prerequisites
10
10
11
-
1.[GDRCopy](https://github.com/NVIDIA/gdrcopy) (v2.4 and above recommended) is a low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology, and *it requires kernel module installation with root privileges.*
12
-
13
-
2. Hardware requirements
14
-
- GPUDirect RDMA capable devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
11
+
Hardware requirements:
12
+
- GPUs inside one node needs to be connected by NVLink
13
+
- GPUs across different nodes needs to be connected by RDMA devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
15
14
- InfiniBand GPUDirect Async (IBGDA) support, see [IBGDA Overview](https://developer.nvidia.com/blog/improving-network-performance-of-hpc-systems-using-nvidia-magnum-io-nvshmem-and-gpudirect-async/)
16
15
- For more detailed requirements, see [NVSHMEM Hardware Specifications](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements)
17
16
18
17
## Installation procedure
19
18
20
-
### 1. Install GDRCopy
21
-
22
-
GDRCopy requires kernel module installation on the host system. Complete these steps on the bare-metal host before container deployment:
gdrcopy_copybw # Should show bandwidth test results
64
-
```
65
-
66
-
### 2. Acquiring NVSHMEM source code
19
+
### 1. Acquiring NVSHMEM source code
67
20
68
21
Download NVSHMEM v3.2.5 from the [NVIDIA NVSHMEM OPEN SOURCE PACKAGES](https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz).
69
22
70
-
### 3. Apply our custom patch
23
+
### 2. Apply our custom patch
71
24
72
25
Navigate to your NVSHMEM source directory and apply our provided patch:
### 3. Configure NVIDIA driver (required by inter-node communication)
79
32
80
33
Enable IBGDA by modifying `/etc/modprobe.d/nvidia.conf`:
81
34
@@ -92,26 +45,31 @@ sudo reboot
92
45
93
46
For more detailed configurations, please refer to the [NVSHMEM Installation Guide](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html).
94
47
95
-
### 5. Build and installation
48
+
### 4. Build and installation
96
49
97
-
The following example demonstrates building NVSHMEM with IBGDA support:
50
+
DeepEP uses NVLink for intra-node communication and IBGDA for inter-node communication. All the other features are disabled to reduce the dependencies.
0 commit comments