Skip to content

Commit

Permalink
Update quickstart.md
Browse files Browse the repository at this point in the history
  • Loading branch information
chhwang committed Jul 5, 2023
1 parent 1201c9c commit 6c96b97
Showing 1 changed file with 62 additions and 8 deletions.
70 changes: 62 additions & 8 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,78 @@

## Build from Source

```
CMake 3.26 or later is required.

```bash
$ git clone https://github.com/microsoft/mscclpp.git
$ mkdir -p mscclpp/build && cd mscclpp/build
$ cmake ..
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j
```

## Install from Source

```bash
# Install the generated headers and binaries to /usr/local/mscclpp
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/mscclpp ..
$ make -j mscclpp
$ sudo make install/fast
```

## Docker Images

Our base image installs all prerequisites for MSCCL++.

```bash
$ docker pull ghcr.io/microsoft/mscclpp/mscclpp:base-cuda12.1
```

## Unit Tests

`unit_tests` require one GPU on the system. It only tests operation of basic components.

```bash
$ make -j unit_tests
$ ./test/unit_tests
```
# Install the generated headers and binaries to /usr/local
$ cmake --install . --prefix /usr/local

For thorough testing of MSCCL++ features, we need to use `mp_unit_tests` that require at least two GPUs on the system. `mp_unit_tests` also requires MPI to be installed on the system. For example, the following commands run `mp_unit_tests` with two processes (two GPUs). The number of GPUs can be changed by changing the number of processes.

```bash
$ make -j mp_unit_tests
$ mpirun -np 2 ./test/mp_unit_tests
```

## Install from Package
## mscclpp-test

TBU
mscclpp-test is a set of performance benchmarks for MSCCL++. It requires MPI to be installed on the system.

```bash
$ make -j sendrecv_test_perf allgather_test_perf allreduce_test_perf alltoall_test_perf
```

## (Optional) Unit Tests
For example, the following command runs the AllReduce benchmark with 8 GPUs starting from 3MB to 48MB messages, by doubling the message size in between.

TBU
```bash
$ mpirun -np 8 ./test/mscclpp-test/allreduce_test_perf -b 3m -e 48m -G 100 -n 100 -w 20 -f 2 -k 4
```

Check the help message for more details.

```bash
$ ./test/mscclpp-test/allreduce_test_perf --help
USAGE: allreduce_test_perf
[-b,--minbytes <min size in bytes>]
[-e,--maxbytes <max size in bytes>]
[-i,--stepbytes <increment size>]
[-f,--stepfactor <increment factor>]
[-n,--iters <iteration count>]
[-w,--warmup_iters <warmup iteration count>]
[-c,--check <0/1>]
[-T,--timeout <time in seconds>]
[-G,--cudagraph <num graph launches>]
[-a,--average <0/1/2/3> report average iteration time <0=RANK0/1=AVG/2=MIN/3=MAX>]
[-k,--kernel_num <kernel number of commnication primitive>]
[-o, --output_file <output file name>]
[-h,--help]
```

0 comments on commit 6c96b97

Please sign in to comment.