Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013

epignatelli · 2021-02-16T15:56:13Z

🐛 Bug

More a discussion than a real bug.

The docs for the DDP say:

Make sure to set the random seed before the instantiation of a Trainer() so that each model initializes with the same weights.

In the most common settings, it is considered bad practice to initialise the modules with different parameters according to the GPU.
To me - and this is a personal opinion - it does not make sense in theory as well.

The reason why I posted it as a bug and not as a doc issue is that this is really dangerous, if you don't know about it.
Using a note in the docs does not help either. I don't assume a user reads the entire docs before using a library.

awaelchli · 2021-02-16T17:33:55Z

So a solution would be to broadcast the weights from process 0 to all others before starting training. Is this correct? Then setting the seed is optional.

epignatelli · 2021-02-16T19:12:37Z

I am not sure what the solution might be, as my knowledge about lightning is poor.
On a high level, I guess passing the seed would be the minimal information pytorch needs to coordinate the initialisations.

awaelchli · 2021-02-17T15:52:12Z

On a high level, I guess passing the seed would be the minimal information pytorch needs to coordinate the initialisations.

We can't know the seed unless the user sets it, right?
So I believe my suggestion is the only way if we want to allow for random initial conditions while having the same state across processes.

awaelchli · 2021-02-17T15:53:09Z

@justusschock @tchaton @SeanNaren any thoughts/concerns here?

justusschock · 2021-02-17T16:05:51Z

@awaelchli You're right, but luckily there is nothing left to do for us besides removing this line from the docs. Since your refactor bases it back to plain torch.nn.DistributedDataParallel, it does that automatically (see here , here and
here)

awaelchli · 2021-02-17T16:22:50Z

oh you're right, that's awesome. And it only makes sense now that pytorch DDP would take care of this. I sent a PR to remove the outdated info

epignatelli added bug Something isn't working help wanted Open to be worked on labels Feb 16, 2021

edenlightning added the docs Documentation related label Feb 16, 2021

epignatelli changed the title ~~Initialising modules with different weights is dangerously buried in the doc notes~~ Initialising modules with different weights with DDP is dangerously buried in the doc notes Feb 16, 2021

awaelchli mentioned this issue Feb 17, 2021

remove outdated info about seeding in multi-gpu docs #6032

Merged

11 tasks

Borda closed this as completed in #6032 Feb 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013

Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013

epignatelli commented Feb 16, 2021 •

edited

Loading

awaelchli commented Feb 16, 2021

epignatelli commented Feb 16, 2021

awaelchli commented Feb 17, 2021

awaelchli commented Feb 17, 2021

justusschock commented Feb 17, 2021 •

edited

Loading

awaelchli commented Feb 17, 2021

Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013

Initialising modules with different weights with DDP is dangerously buried in the doc notes #6013

Comments

epignatelli commented Feb 16, 2021 • edited Loading

🐛 Bug

awaelchli commented Feb 16, 2021

epignatelli commented Feb 16, 2021

awaelchli commented Feb 17, 2021

awaelchli commented Feb 17, 2021

justusschock commented Feb 17, 2021 • edited Loading

awaelchli commented Feb 17, 2021

epignatelli commented Feb 16, 2021 •

edited

Loading

justusschock commented Feb 17, 2021 •

edited

Loading