Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - do not merge: documenting clashing of WandB logger and LightningCLI clash #7675

Conversation

oplatek
Copy link
Contributor

@oplatek oplatek commented May 24, 2021

This PR documents LightningCLI.save_config_callback default filename clash
with WandB logger on basic_example autoencoder.

See run-example-wandb-save-config-callback-clash-workaournd.sh, autoencoder.{py,yml}
I used the main environment.yml to document the dependencies.

This example below reproduces the error when used default LightningCLI and WandB logger.
However, if you just change the single line in basic_examples/autoencoder.yml

# save_config_callback_filename: ''   # Keeps the LightningCLI intact - do not apply workaround  this breaks 
# to
save_config_callback_filename: 'another-name-config.yaml'   # sets different filename in save_config_callback

It runs fine.

If you do not change anything you will be able to reproduce the error.

$ ./run-example-wandb-save-config-callback-clash-workaournd.sh   # inside the pytorch-lightning/pl_examples directory
...
Traceback (most recent call last):
    self.parser.save(self.config, config_path, skip_none=False)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/jsonargparse/core.py", line 811, in save
  File "basic_examples/autoencoder.py", line 139, in <module>
    cli_main()
  File "basic_examples/autoencoder.py", line 132, in cli_main
    cli = WandBandSafeConfigCallBackFixCLI(LitAutoEncoder, MyDataModule, seed_everything_default=1234)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/utilities/cli.py", line 173, in __init__
    self.fit()
    check_overwrite(path_fc)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/jsonargparse/core.py", line 808, in check_overwrite
  File ️"/lnwt/wojk/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/utilities/cli.py", line 256, in fit
    self.trainer.fit(**self.fit_kwargs)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
    raise ValueError('Refusing to overwrite existing file: '+path())
ValueError: Refusing to overwrite existing file: /lnet/work/people/oplatek/pytorch-lightning/pl_examples/wandb/run-20210524_145129-1rtepmq4/files/config.yaml
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 855, in run_train
    self.train_loop.on_train_start()
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 101, in on_train_start
    self.trainer.call_hook("on_train_start")
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1223, in call_hook
    trainer_hook(*args, **kwargs)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 152, in on_train_start
    callback.on_train_start(self, self.lightning_module)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/pytorch_lightning/utilities/cli.py", line 87, in on_train_start
    self.parser.save(self.config, config_path, skip_none=False)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/jsonargparse/core.py", line 811, in save
    check_overwrite(path_fc)
  File "/lnet/work/people/oplatek/pytorch-lightning/env/lib/python3.8/site-packages/jsonargparse/core.py", line 808, in check_overwrite
    raise ValueError('Refusing to overwrite existing file: '+path())
ValueError: Refusing to overwrite existing file: /lnet/work/people/oplatek/pytorch-lightning/pl_examples/wandb/run-20210524_145129-1rtepmq4/files/config.yaml

wandb: Waiting for W&B process to finish, PID 8111
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.
wandb:                                                                                
wandb: Find user logs for this run at: /lnet/work/people/oplatek/pytorch-lightning/pl_examples/wandb/run-20210524_145129-1rtepmq4/logs/debug.log
wandb: Find internal logs for this run at: /lnet/work/people/oplatek/pytorch-lightning/pl_examples/wandb/run-20210524_145129-1rtepmq4/logs/debug-internal.log
wandb: Synced 7 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)

I think it would be useful to pass the filename argument to the constructor of the callback so one could actually change the filename using LightningCLI argument https://github.com/PyTorchLightning/pytorch-lightning/blob/2103b5efc98669f86aafa0fa98490df2e13142b7/pytorch_lightning/utilities/cli.py#244

This commit dococument LightningCLI.save_config_callback default filename clash
with WandB logger on basic_example autoencoder.
See run-example-wandb-save-config-callback-clash-workaournd.sh, autoencoder.{py,yml}
I used the main environment.yml to document the dependecies I used
@pep8speaks
Copy link

pep8speaks commented May 24, 2021

Hello @oplatek! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-24 13:08:53 UTC

@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #7675 (ebd5e3f) into master (01109cd) will decrease coverage by 5%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #7675    +/-   ##
=======================================
- Coverage      93%     88%    -5%     
=======================================
  Files         198     200     +2     
  Lines       12848   12962   +114     
=======================================
- Hits        11888   11378   -510     
- Misses        960    1584   +624     

@oplatek
Copy link
Contributor Author

oplatek commented May 24, 2021

Converted this to issue #7679

@oplatek oplatek closed this May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants