Update quickstart.md

microsoft · Jul 5, 2023 · 6c96b97 · 6c96b97
1 parent 1201c9c
commit 6c96b97
Showing 1 changed file with 62 additions and 8 deletions.
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -15,24 +15,78 @@
 
 ## Build from Source
 
-```
+CMake 3.26 or later is required.
+
+```bash
 $ git clone https://github.com/microsoft/mscclpp.git
 $ mkdir -p mscclpp/build && cd mscclpp/build
-$ cmake ..
+$ cmake -DCMAKE_BUILD_TYPE=Release ..
 $ make -j
 ```
 
 ## Install from Source
 
+```bash
+# Install the generated headers and binaries to /usr/local/mscclpp
+$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/mscclpp ..
+$ make -j mscclpp
+$ sudo make install/fast
+```
+
+## Docker Images
+
+Our base image installs all prerequisites for MSCCL++.
+
+```bash
+$ docker pull ghcr.io/microsoft/mscclpp/mscclpp:base-cuda12.1
+```
+
+## Unit Tests
+
+`unit_tests` require one GPU on the system. It only tests operation of basic components.
+
+```bash
+$ make -j unit_tests
+$ ./test/unit_tests
 ```
-# Install the generated headers and binaries to /usr/local
-$ cmake --install . --prefix /usr/local
+
+For thorough testing of MSCCL++ features, we need to use `mp_unit_tests` that require at least two GPUs on the system. `mp_unit_tests` also requires MPI to be installed on the system. For example, the following commands run `mp_unit_tests` with two processes (two GPUs). The number of GPUs can be changed by changing the number of processes.
+
+```bash
+$ make -j mp_unit_tests
+$ mpirun -np 2 ./test/mp_unit_tests
 ```
 
-## Install from Package
+## mscclpp-test
 
-TBU
+mscclpp-test is a set of performance benchmarks for MSCCL++. It requires MPI to be installed on the system.
+
+```bash
+$ make -j sendrecv_test_perf allgather_test_perf allreduce_test_perf alltoall_test_perf
+```
 
-## (Optional) Unit Tests
+For example, the following command runs the AllReduce benchmark with 8 GPUs starting from 3MB to 48MB messages, by doubling the message size in between.
 
-TBU
+```bash
+$ mpirun -np 8 ./test/mscclpp-test/allreduce_test_perf -b 3m -e 48m -G 100 -n 100 -w 20 -f 2 -k 4
+```
+
+Check the help message for more details.
+
+```bash
+$ ./test/mscclpp-test/allreduce_test_perf --help
+USAGE: allreduce_test_perf 
+        [-b,--minbytes <min size in bytes>] 
+        [-e,--maxbytes <max size in bytes>] 
+        [-i,--stepbytes <increment size>] 
+        [-f,--stepfactor <increment factor>] 
+        [-n,--iters <iteration count>] 
+        [-w,--warmup_iters <warmup iteration count>] 
+        [-c,--check <0/1>] 
+        [-T,--timeout <time in seconds>] 
+        [-G,--cudagraph <num graph launches>] 
+        [-a,--average <0/1/2/3> report average iteration time <0=RANK0/1=AVG/2=MIN/3=MAX>] 
+        [-k,--kernel_num <kernel number of commnication primitive>] 
+        [-o, --output_file <output file name>] 
+        [-h,--help]
+```