Learning rate finder fails for optimizer with internal state #3340
Labels
bug
Something isn't working
help wanted
Open to be worked on
waiting on author
Waiting on user action, correction, or update
Milestone
🐛 Bug
DISCLAIMER: I think this is solved in the newApparently it's not.Trainer.tune()
approach. Still writing this to raise awareness and in case someone looks for this.When using the auto_lr_find tag, I noticed it fails when used with torch.optim.Adagrad. Upon closer inspection, I realized that it's due to the internal state of the optimizer.
When the optimizer is first initialized, it populates a state attribute with tensors like the model parameters (in
Trainer.fit()
, line 1011, v0.9.0). However, the model parameters will be moved to the correct device afterwards (inTrainer.fit()
, line 1030+ depending on backend, v0.9.0).This results in an error, since the state is still on cpu, while the model has been moved to cuda.
Error message in here
Code sample
I used the doc example and fitted it with a random dataset for this minimal example.
Expected behavior
The Trainer should first correctly setup the model before initializing the optimizers when using the learn rate finder.
Environment
- GPU:
- GeForce GTX 1050
- available: True
- version: 10.2
- numpy: 1.19.1
- pyTorch_debug: False
- pyTorch_version: 1.6.0
- pytorch-lightning: 0.9.0
- tensorboard: 2.2.0
- tqdm: 4.48.2
- OS: Windows
- architecture:
- 64bit
- WindowsPE
- processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
- python: 3.8.5
- version: 10.0.18362
The text was updated successfully, but these errors were encountered: