-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'RuntimeError: No rendezvous handler for env://' with multi-gpu #5358
Comments
Hi! thanks for your contribution!, great first issue! |
Also, I don't know if it is related, but when I check the GPU performance during training (with the flag GPU = 1) using windows task manager, I can see only 1-2% used in the GPU, and 45-50% in the CPU. Is this a normal behaviour? |
@costantinoai mind share what PL version are you using? also, do you have and full example to reproduce? |
Hi @Borda , PL version is 1.1.2. I do have an example of the full code on colab, but I would rather not post it publicly. How can I share it with you? |
Hi, you can ping me on slack if you want. It's probably an issue with passing the argument gpus=-1 to the subprocess script. I bet if you set gpus=n where n is the number of gpus, it will work. We just have to support -1 for ddp. |
Ok, thanks. I’ll try setting n and see what happens. I’ll send you the Colab link on slack if I still have issues.
Really appreciated!
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Adrian Wälchli <[email protected]>
Sent: Thursday, January 7, 2021 11:37:33 AM
To: PyTorchLightning/pytorch-lightning <[email protected]>
Cc: Andrea Costantino <[email protected]>; Mention <[email protected]>
Subject: Re: [PyTorchLightning/pytorch-lightning] 'RuntimeError: No rendezvous handler for env://' with multi-gpu (#5358)
Hi, you can ping me on slack if you want. It's probably an issue with passing the argument gpus=-1 to the script. I bet if you set gpus=n where n is the number of gpus, it will work. We just have to support -1 for ddp.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#5358 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOCXNCMM3CA3OQJSM2ZAP5DSYT643ANCNFSM4VU7DUZQ>.
|
@awaelchli still got the same problem after setting gpus = 2. I reached you on twitter (I don't have a slack account). Thanks! |
In summary after private conversation with @costantinoai
if these requirements are not met, we see the |
Hi, I also get this error when adding the second gpu to machine: RuntimeError: No rendezvous handler for env:// please advise how to fix and/or work around? |
RuntimeError: No rendezvous handler for env:// That's not much information, but one possibility is because you are on Windows. |
I am on windows and saw this error. change accelerator to 'dp' works. |
Windows 10 user here.. this worked for me |
I am on Pytorch-Lightning 1.2.1. and I still run into the issue on Windows if I set accelerator to "dp". I am training on 1 GPU. |
@carlomarxdk is deepspeed supported on windows? I can't find any mention of it, so probably not. |
I ran into this issue on Windows 10. |
Changing the accelerator to dp on Windows 10 as suggested by @awaelchli and @mdja solved my issue. |
🐛 Bug
I get an error
'RuntimeError: No rendezvous handler for env://'
when I run my model with multiple GPU.
Below the code and the traceback:
trainer.fit(model)
The error is not present if I set
gpus = 1
Expected behavior
Environment
conda
,pip
, source): condaThe text was updated successfully, but these errors were encountered: