-
Notifications
You must be signed in to change notification settings - Fork 128
Training on mutiple nodes #13
Comments
I am not sure how to run it through ethernet connected nodes. |
@jingli9111 Thanks very much for your reply. Anyway, the concern is that it might be too slow to train on a single machine. |
@d-li14 The provided main.py script internally uses multiprocessing. In order to use two nodes without slurm, best is to get rid of the multiprocessing.spawn command in the main, and merge the main function with main_worker. I have attached the kind of main function I have for your reference at the bottom. Once you have that code structure, use multiproc.py from NVIDIA to execute the code (https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/multiproc.py). An alternate is to use the internal PyTorch launcher i.e. It takes the --nnodes parameters, which should be set to 2, as well as --nproc_per_node which should be set to 8. Since we need a way for the two nodes to communicate, you have to assume one node to be the master, and note its IP address. Once you have the IP address, just use that IP address with --master_addr ... for both nodes. So the final command will look something like:
|
@shoaibahmed Thanks a lot for your reply and detailed instructions! I will try it. |
does anyone know how much GPU memory does it consume with a batch size 1024? I am running out of memory even with 1024 batch size on 8 V100 |
Me too. The largest batch size I was able to fit on 8 V100 (with 16GB of memory) was 512, which used a little over 11GB of memory per gpu. |
Thanks for your great work. If there are two machines (each with 8 V100 GPUs) connected with ethernet, without slurm management, then how to run the code with your stated 16 V100 config?
The text was updated successfully, but these errors were encountered: