Skip to content

Commit

Permalink
docs: add step-by-step tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
ymjiang committed Jun 27, 2019
1 parent c0922b0 commit 910a898
Show file tree
Hide file tree
Showing 2 changed files with 181 additions and 18 deletions.
20 changes: 2 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,25 +46,9 @@ python setup.py install
```
Note: you may set `BYTEPS_USE_RDMA=1` to install with RDMA support.

Now you can try our [examples](example). Let's say you are using MXNet and want to try a Resnet50 training benchmark:
We provide a [step-by-step tutorial](docs/step-by-step-tutorials) for you to run benchmark training tasks.

```
export NVIDIA_VISIBLE_DEVICES=0,1 \
DMLC_NUM_WORKER=1 \
DMLC_NUM_SERVER=1 \
DMLC_WORKER_ID=0 \
DMLC_ROLE=worker \
DMLC_PS_ROOT_URI=10.0.0.1 \
DMLC_PS_ROOT_PORT=1234 \
DMLC_INTERFACE=eth0
python byteps/launcher/launch.py byteps/example/mxnet/train_imagenet_byteps.py --benchmark 1 --batch-size=32
```

For distributed training, you also need to build a server image. We provide [Dockerfiles](docker) as examples.
You may use the same images for the scheduler and the servers.

Refer to [Documentations](docs) for how to [launch distributed jobs](docs/running.md) and more [detailed configurations](docs/env.md).
Also refer to [Documentations](docs) for how to [launch distributed jobs](docs/running.md) and more [detailed configurations](docs/env.md).

## Use BytePS in Your Code

Expand Down
179 changes: 179 additions & 0 deletions docs/step-by-step-tutorials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# A Step-by-Step Tutorial

The goal of this tutorial is to help you run BytePS quickly. To ensure that you don't get trouble with system environments problem, we recommend you to use our provided images (as the first step).


## Single Machine Training

### TensorFlow
```
docker pull bytepsimage/worker_tensorflow
nvidia-docker run --shm-size=32768m -it bytepsimage/worker_tensorflow bash
# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3 # say you have 4 GPUs
export DMLC_WORKER_ID=0 # your worker id
export DMLC_NUM_WORKER=1 # you only have one worker
export DMLC_ROLE=worker # your role is worker
# the following value does not matter for non-distributed jobs
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1
export DMLC_PS_ROOT_PORT=1234
# can also try: export EVAL_TYPE=mnist
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh \
--model ResNet50 --num-iters 1000
```

### PyTorch


```
docker pull bytepsimage/worker_pytorch
nvidia-docker run --shm-size=32768m -it bytepsimage/worker_pytorch bash
# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3 # say you have 4 GPUs
export DMLC_WORKER_ID=0 # your worker id
export DMLC_NUM_WORKER=1 # you only have one worker
export DMLC_ROLE=worker # your role is worker
# the following value does not matter for non-distributed jobs
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1
export DMLC_PS_ROOT_PORT=1234
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh \
--model resnet50 --num-iters 1000
```

### MXNet

```
docker pull bytepsimage/worker_mxnet
nvidia-docker run --shm-size=32768m -it bytepsimage/worker_mxnet bash
# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3 # say you have 4 GPUs
export DMLC_WORKER_ID=0 # your worker id
export DMLC_NUM_WORKER=1 # you only have one worker
export DMLC_ROLE=worker # your role is worker
# the following value does not matter for non-distributed jobs
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1
export DMLC_PS_ROOT_PORT=1234
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh \
--benchmark 1 --batch-size=32
```

## Distributed Training

Let's say you have two workers, and each one with 4 GPUs. For simplicity we use one server.

The way to launch the scheduler and the server are the same for any framework.

For the scheduler:
```
# scheduler can use the same image as servers
docker pull bytepsimage/byteps_server
docker run -it bytepsimage/byteps_server bash
# now you are in docker environment
export DMLC_NUM_WORKER=2
export DMLC_ROLE=scheduler
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1 # the scheduler IP
export DMLC_PS_ROOT_PORT=1234 # the scheduler port
python /usr/local/byteps/launcher/launch.py
```

For the server:
```
docker pull bytepsimage/byteps_server
docker run -it bytepsimage/byteps_server bash
# now you are in docker environment
export DMLC_NUM_WORKER=2
export DMLC_ROLE=server
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1 # the scheduler IP
export DMLC_PS_ROOT_PORT=1234 # the scheduler port
python /usr/local/byteps/launcher/launch.py
```

For the workers, you need to pay attention to `DMLC_WORKER_ID`. This is the main difference compared to single machine jobs. Let's say the 2 workers are using MXNet.

For worker-0:
```
docker pull bytepsimage/worker_mxnet
nvidia-docker run --shm-size=32768m -it bytepsimage/worker_mxnet bash
# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3 # say you have 4 GPUs
export DMLC_WORKER_ID=0 # worker-0
export DMLC_NUM_WORKER=2 # 2 workers
export DMLC_ROLE=worker # your role is worker
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1 # the scheduler IP
export DMLC_PS_ROOT_PORT=1234 # the scheduler port
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh \
--benchmark 1 --batch-size=32
```

For worker-1:

```
docker pull bytepsimage/worker_mxnet
nvidia-docker run --shm-size=32768m -it bytepsimage/worker_mxnet bash
# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3 # say you have 4 GPUs
export DMLC_WORKER_ID=1 # worker-1
export DMLC_NUM_WORKER=2 # 2 workers
export DMLC_ROLE=worker # your role is worker
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=10.0.0.1 # the scheduler IP
export DMLC_PS_ROOT_PORT=1234 # the scheduler port
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh \
--benchmark 1 --batch-size=32
```

If your workers use TensorFlow, you need to change the image name to `bytepsimage/worker_tensorflow`, and replace the python script with
```
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh \
--model ResNet50 --num-iters 1000
```

If your workers use PyTorch, you need to change the image name to `bytepsimage/worker_pytorch`, and replace the python script with

```
python /usr/local/byteps/launcher/launch.py \
/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh \
--model resnet50 --num-iters 1000
```

0 comments on commit 910a898

Please sign in to comment.