-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013
Comments
So a solution would be to broadcast the weights from process 0 to all others before starting training. Is this correct? Then setting the seed is optional. |
I am not sure what the solution might be, as my knowledge about lightning is poor. |
We can't know the seed unless the user sets it, right? |
@justusschock @tchaton @SeanNaren any thoughts/concerns here? |
@awaelchli You're right, but luckily there is nothing left to do for us besides removing this line from the docs. Since your refactor bases it back to plain torch.nn.DistributedDataParallel, it does that automatically (see here , here and |
oh you're right, that's awesome. And it only makes sense now that pytorch DDP would take care of this. I sent a PR to remove the outdated info |
🐛 Bug
More a discussion than a real bug.
The docs for the DDP say:
In the most common settings, it is considered bad practice to initialise the modules with different parameters according to the GPU.
To me - and this is a personal opinion - it does not make sense in theory as well.
The reason why I posted it as a bug and not as a doc issue is that this is really dangerous, if you don't know about it.
Using a note in the docs does not help either. I don't assume a user reads the entire docs before using a library.
The text was updated successfully, but these errors were encountered: