enable_pl_optimizer causes optimizers to not be restored properly #5224

PiotrDabkowski · 2020-12-21T21:10:58Z

🐛 Bug

enable_pl_optimizer (default!) causes optimizers to not be restored properly from the checkpoint specified by resume_from_checkpoint.

BoringModel Colab Reproduction

The model is trained for 3 epochs and saved in a checkpoint. The checkpoint is then restored and further trained for 1 epoch (with different values of enable_pl_optimizer), the training loss is printed at each step.
The setup where enable_pl_optimizer=True shows a huge loss spike after the first optimizer step, suggesting that the optimizer is not restored properly.

https://colab.research.google.com/drive/1lHYXm4MpnmXwPZTcPem4D4wwwU5vJhHc?usp=sharing

Expected behavior

PL Optimizers are restored such that there is no huge loss spike after restore, just like when enable_pl_optimizer=False.

Environment

See Colab.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-12-21T21:11:36Z

Hi! thanks for your contribution!, great first issue!

heng-yuwen · 2020-12-21T22:56:30Z

Hi, I think you can check #4655. I think these problems are related. @PiotrDabkowski

heng-yuwen · 2020-12-21T23:23:16Z

The loaded optimiser is restored as the initial one.

PiotrDabkowski · 2020-12-21T23:57:32Z

Possibly, but not sure. Does disabling pl_optimizer fix #4655 as well?

tchaton · 2020-12-22T08:16:28Z

Hey @PiotrDabkowski,

enable_pl_optimizer has been reset to False by default.

We will look into this bug.

Thanks,
T.C

heng-yuwen · 2020-12-22T10:07:59Z

I didn't try to disable enable_pl_optimizer, @tchaton is there anything I should take care of when enable_pl_optimizer=False? I use ddp to train on Slurm cluster.

tchaton · 2020-12-23T08:51:26Z

Hey @Hyw1994,

I checked your notebook ! You are entirely right.
I will look into today as it is high priority to re-enable LightningOptimizer :)

Best regards,
T.C

PiotrDabkowski added bug Something isn't working help wanted Open to be worked on labels Dec 21, 2020

Borda assigned tchaton Dec 21, 2020

tchaton added the priority: 1 Medium priority task label Dec 22, 2020

tchaton added this to the 1.1.x milestone Dec 22, 2020

tchaton mentioned this issue Dec 23, 2020

deprecate enable_pl_optimizer as it is not restored properly #5244

Merged

11 tasks

tchaton closed this as completed in #5244 Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable_pl_optimizer causes optimizers to not be restored properly #5224

enable_pl_optimizer causes optimizers to not be restored properly #5224

PiotrDabkowski commented Dec 21, 2020 •

edited

Loading

github-actions bot commented Dec 21, 2020

heng-yuwen commented Dec 21, 2020

heng-yuwen commented Dec 21, 2020

PiotrDabkowski commented Dec 21, 2020

tchaton commented Dec 22, 2020

heng-yuwen commented Dec 22, 2020 •

edited

Loading

tchaton commented Dec 23, 2020

enable_pl_optimizer causes optimizers to not be restored properly #5224

enable_pl_optimizer causes optimizers to not be restored properly #5224

Comments

PiotrDabkowski commented Dec 21, 2020 • edited Loading

🐛 Bug

BoringModel Colab Reproduction

Expected behavior

Environment

github-actions bot commented Dec 21, 2020

heng-yuwen commented Dec 21, 2020

heng-yuwen commented Dec 21, 2020

PiotrDabkowski commented Dec 21, 2020

tchaton commented Dec 22, 2020

heng-yuwen commented Dec 22, 2020 • edited Loading

tchaton commented Dec 23, 2020

PiotrDabkowski commented Dec 21, 2020 •

edited

Loading

heng-yuwen commented Dec 22, 2020 •

edited

Loading