Trainer "optimizers" attribute is None when saving checkpoint and callbacks list is not empty #2936

import-antigravity · 2020-08-12T18:38:40Z

🐛 Bug

I'm training a GAN and I'm running a few custom callbacks as well. When the model attempts to save at the end of the first epoch, it crashes. Here's the very strange thing: I have the exact same code in a Jupyter notebook and the error doesn't occur.

To Reproduce

Steps to reproduce the behavior:

The bug does not occur when the callbacks list passed into the trainer is empty. None of the callbacks I'm using have anything to do with saving checkpoints, they're all for logging certain things about the model. Enabling any one of them causes the error. Running the exact same code in Jupyter results in no crashes.

Stack trace:

Traceback (most recent call last):███████████████████████████████████-| 98.33% [590/600 00:05<00:00 loss: -0.558, v_num: 1, d_loss: -1.120, g_loss: -0.016]
  File "mnist-dense-gan-convergence.py", line 55, in <module>
    main(args)
  File "mnist-dense-gan-convergence.py", line 45, in main
    trainer.fit(gan)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1044, in fit
    results = self.run_pretrain_routine(model)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1213, in run_pretrain_routine
    self.train()
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 370, in train
    self.run_training_epoch()
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 502, in run_training_epoch
    self.check_checkpoint_callback(should_check_val)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 513, in check_checkpoint_callback
    [c.on_validation_end(self, self.get_model()) for c in checkpoint_callbacks]
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 513, in <listcomp>
    [c.on_validation_end(self, self.get_model()) for c in checkpoint_callbacks]
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 12, in wrapped_fn
    return fn(*args, **kwargs)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 309, in on_validation_end
    self._do_check_save(filepath, current, epoch)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 346, in _do_check_save
    self._save_model(filepath)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 168, in _save_model
    self.save_function(filepath, self.save_weights_only)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_io.py", line 268, in save_checkpoint
    checkpoint = self.dump_checkpoint(weights_only)
  File "/Users/robbie/.conda/envs/ganresearch/lib/python3.7/site-packages/pytorch_lightning/trainer/training_io.py", line 350, in dump_checkpoint
    for i, optimizer in enumerate(self.optimizers):
TypeError: 'NoneType' object is not iterable

Code sample

Here is the relevant part of my setup code:

inception_callback = GANInceptionScorer(classifier, logits=True, sample_size=1000, input_shape=(-1, 1, 28, 28))

log_dir = os.path.abspath('../logs/mnist-dense-gan-convergence')

params = ParameterMatrixCallback()

callbacks = [
    GANProgressBar(),
    GANTensorboardImageView(),
    params,
    inception_callback
]

trainer_args = {
        'max_epochs': 100,
        'default_root_dir': log_dir,
        'callbacks': callbacks,
        'progress_bar_refresh_rate': 0
    }

    print(log_dir)
    try:
        trainer = Trainer(gpus=1, **trainer_args)
    except MisconfigurationException:
        trainer = Trainer(**trainer_args)

    trainer.fit(gan)

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py

and the same code in Jupyter:

inception_callback = GANInceptionScorer(classifier, logits=True, sample_size=1000, input_shape=(-1, 1, 28, 28))

log_dir = os.path.abspath('../logs/mnist-gan-dense')

params = ParameterMatrixCallback()

trainer_args = {
    'max_epochs': 200, 
    'callbacks': [GANProgressBar(), GANTensorboardImageView(n=4), params, inception_callback],
    'progress_bar_refresh_rate': 0, 
    'default_root_dir': log_dir
}

t = Trainer(**trainer_args)

PyTorch Version (e.g., 1.0): 1.3.1
OS (e.g., Linux): macOS
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
Any other relevant information: pytorch-lightning 0.8.5

The text was updated successfully, but these errors were encountered:

import-antigravity · 2020-08-12T18:41:55Z

Additional info, here are the relevant methods in my GAN class:

class GAN(LightningModule, ABC):
   ...

    @abstractmethod
    def g_optimizer(self) -> Optimizer:
        pass

    @abstractmethod
    def d_optimizer(self) -> Optimizer:
        pass

    def configure_optimizers(self):
        return self.g_optimizer(), self.d_optimizer()

class MnistGanDense(GAN):
    ...

    def g_optimizer(self) -> Optimizer:
        return optim.RMSprop(self.G.parameters(), self.hparams['learning_rate'])

    def d_optimizer(self) -> Optimizer:
        return optim.RMSprop(self.D.parameters(), self.hparams['learning_rate'])

williamFalcon · 2020-08-14T01:51:49Z

could you try 0.9.0rc12?

import-antigravity · 2020-08-14T02:34:34Z

Is there a way to do that with conda?

justusschock · 2020-08-14T06:44:05Z

inside your Conda environment you could also install it with pip

williamFalcon · 2020-08-15T11:47:42Z

Inside conda you can always install with pip:

pip install pytorch-lightning==0.9.0rc13

If this is still an issue, happy to reopen

deekshadangwal · 2020-09-01T21:41:55Z

This is still a problem for me. I updated to 0.9.1rc1 and still get this error. Here is my trace.

Traceback (most recent call last):
  File "train_unet.py", line 270, in <module>
    trainer.save_checkpoint(args.save_checkpoint_path)
  File "/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 275, in
 save_checkpointe
    checkpoint = self.dump_checkpoint(weights_only)
  File "/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 360, in
 dump_checkpoint
    for i, optimizer in enumerate(self.optimizers):
TypeError: 'NoneType' object is not iterable

import-antigravity · 2020-09-06T20:04:11Z

@williamFalcon could you open this again? I'm still getting the error as well

awaelchli · 2020-09-30T06:35:44Z

@rohitgr7 didn't we recently make optimizers init to an empty list instead of None? I think this should solve the problem. Could you check?

rohitgr7 · 2020-09-30T06:44:51Z

@awaelchli yes its an empty list now. But the code for lightning model defined above has optimizers defined, so am not sure yet what's the issue there.

@import-antigravity mind check this on master?

Borda · 2020-10-02T19:11:37Z

@deekshadangwal mind share full sample code so we can reproduce your issue?

import-antigravity added bug Something isn't working help wanted Open to be worked on labels Aug 12, 2020

williamFalcon closed this as completed Aug 15, 2020

edenlightning reopened this Sep 8, 2020

edenlightning added this to the 0.9.x milestone Sep 8, 2020

edenlightning added checkpointing Related to checkpointing priority: 0 High priority task labels Sep 16, 2020

edenlightning assigned Borda Sep 21, 2020

edenlightning added the v1.0 allowed label Sep 22, 2020

Borda added waiting on author Waiting on user action, correction, or update and removed priority: 0 High priority task labels Oct 2, 2020

edenlightning modified the milestones: 0.9.x, 1.0 Oct 4, 2020

edenlightning unassigned Borda Oct 4, 2020

williamFalcon self-assigned this Oct 6, 2020

williamFalcon added a commit that referenced this issue Oct 6, 2020

Fixes #2936 (no fix needed)

5640af4

williamFalcon mentioned this issue Oct 6, 2020

Fixes #2936 (no fix needed) #3892

Merged

williamFalcon closed this as completed in #3892 Oct 6, 2020

williamFalcon added a commit that referenced this issue Oct 6, 2020

Fixes #2936 (no fix needed) (#3892)

cb2a326

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer "optimizers" attribute is None when saving checkpoint and callbacks list is not empty #2936

Trainer "optimizers" attribute is None when saving checkpoint and callbacks list is not empty #2936

import-antigravity commented Aug 12, 2020 •

edited

Loading

import-antigravity commented Aug 12, 2020

williamFalcon commented Aug 14, 2020

import-antigravity commented Aug 14, 2020

justusschock commented Aug 14, 2020

williamFalcon commented Aug 15, 2020

deekshadangwal commented Sep 1, 2020 •

edited by Borda

Loading

import-antigravity commented Sep 6, 2020

awaelchli commented Sep 30, 2020

rohitgr7 commented Sep 30, 2020

Borda commented Oct 2, 2020

Trainer "optimizers" attribute is None when saving checkpoint and callbacks list is not empty #2936

Trainer "optimizers" attribute is None when saving checkpoint and callbacks list is not empty #2936

Comments

import-antigravity commented Aug 12, 2020 • edited Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

import-antigravity commented Aug 12, 2020

williamFalcon commented Aug 14, 2020

import-antigravity commented Aug 14, 2020

justusschock commented Aug 14, 2020

williamFalcon commented Aug 15, 2020

deekshadangwal commented Sep 1, 2020 • edited by Borda Loading

import-antigravity commented Sep 6, 2020

awaelchli commented Sep 30, 2020

rohitgr7 commented Sep 30, 2020

Borda commented Oct 2, 2020

import-antigravity commented Aug 12, 2020 •

edited

Loading

deekshadangwal commented Sep 1, 2020 •

edited by Borda

Loading