-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkpoint save dir is not correctly set when _save_dir is given by wandb logger #2527
Comments
@pableeto I don't think we want to make this part of the code even more complicated. I think it is a left-over from a recent refactor. |
Well, I guess that would be fine :) |
I just found if we remove this block, then it will made the pipeline crush at line 391 of pytorch_lightning/trainer/training_io.py:
|
@pableeto try:
We can re-open if it is not fixed |
@williamFalcon Just tried 0.8.5rc1, problem still exists. |
Fixed here #2681 |
🐛 Bug
When using ModelCheckpoint with default parameter and Wandb Logger with save_dir set to some dir,
The checkpoint is still dumped to os.getcwd()
To Reproduce
........
logger = WandbLogger(save_dir='/path/to/experiment')
trainer = Trainer.from_argparse_args(other_args, logger = logger)
Expected behavior
The checkpoint should be saved under /path/to/experiment defined by Wandb logger's save_dir argument.
Additional context
The pl version i am use is by pip install, i.e. 0.8.4
I think the problem is related to the logic in on_train_start() function in model_checkpoint.py,
Unfortunately, the default of "weights_save_path" is not None; it is set to default_root_dir which is os.getcwd() (See pytorch_lightning/trainer/callback_config.py, line 57):
Thus, the ckpt_path is always set to weights_save_path instead of save_dir from logger.
Fix
A quick patch for this might be as follows:
I would be happy to fork the code and submit a PR, btw.
The text was updated successfully, but these errors were encountered: