This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
Multi gpu training fails using strategy='ddp' #1201
dudeperf3ct
started this conversation in
General
Replies: 2 comments 8 replies
-
@rohitgr7 I was suggested to ask the question in lightning forum as the issue comes from PL library instead of Flash.
Originally posted by @ethanwharris in #1188 (comment) |
Beta Was this translation helpful? Give feedback.
5 replies
-
what's the output for |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Running video classification using flash
Linked issue: #1188
Code Sample
I ran
python -c 'import torch; print(torch.cuda.is_available())'
and pytorch is able to detect all gpus and cuda is available at start of the script. Only when i run using flash trainer usingddp
strategy, I get the above error.The script runs fine when using 1 gpu.
Configurations:
I am running pytorch lightning ngc with
--gpus all
and--shm-size=1g
flagspytorch/lightning/flash : 1.9.0a0/1.5.10/0.8.0dev
I have 8x V100 with driver version 418.67 and cuda version 10.1.
Beta Was this translation helpful? Give feedback.
All reactions