Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint naming broken #775

Closed
neelr11 opened this issue Jan 31, 2020 · 7 comments · Fixed by #1016
Closed

Checkpoint naming broken #775

neelr11 opened this issue Jan 31, 2020 · 7 comments · Fixed by #1016
Assignees
Labels
bug Something isn't working help wanted Open to be worked on
Milestone

Comments

@neelr11
Copy link

neelr11 commented Jan 31, 2020

🐛 Bug

I would like to be able to save checkpoints with custom names that include the value of my val_loss, ie. path/epoch_2-val_loss_0.2.hdf5 . The documentation for ModelCheckpoint suggests that this is possible using the filepath argument. This does not appear to be the case, since the source code calls os.mkdirs(filepath). I have also tried using the prefix argument, but it doesn't seem to be possible to pass it a format string containing a variable.

Expected behavior

The documentation claims that filepath='{epoch:02d}-{val_loss:.2f}.hdf5' will save a checkpoint at /path/epoch_2-val_loss_0.2.hdf5. Instead, it saves a checkpoint at {epoch:02d}-{val_loss:.2f}.hdf5/_ckpt_epoch_1.ckpt.

The issues in the documentation are two-fold:
-- It suggests that filepath can contain the directory + name of the checkpoint, when it seems like it should only contain the directory specifying where to save.
-- It suggests that it can 'contain named formatting options to be auto-filled', which also doesn't seem to be the case.

Is it possible to achieve this functionality with the prefix argument instead? If so, how?

@neelr11 neelr11 added the bug Something isn't working label Jan 31, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented Jan 31, 2020

that’s already possible.

https://pytorch-lightning.readthedocs.io/en/latest/callbacks.html.

# save epoch and val_loss in name
ModelCheckpoint(filepath='{epoch:02d}-{val_loss:.2f}.hdf5')
# saves file like: /path/epoch_2-val_loss_0.2.hdf5

but as you mentioned, if that’s broken then it’s better to submit a PR to fix? :)

@williamFalcon williamFalcon reopened this Jan 31, 2020
@Borda Borda added the help wanted Open to be worked on label Feb 3, 2020
@williamFalcon
Copy link
Contributor

@neelr11 did this work for you? is this still a bug?

@williamFalcon
Copy link
Contributor

@neelr11 @Borda updates on this?

@williamFalcon williamFalcon added this to the 0.6.1 milestone Feb 11, 2020
@neelr11
Copy link
Author

neelr11 commented Feb 11, 2020

@williamFalcon this is still a bug. I can submit a PR this weekend.

@versatran01
Copy link

I suggest following pytorch-ignite and just add the monitored name and value to the saved checkpoint.
So filepath would simply be a directory and saved checkpoint will be something like {prefix}ckpt_epoch{epoch}{monitor}{monitor_value}.ckpt

@Borda
Copy link
Member

Borda commented Feb 18, 2020

@neelr11 are you still interested in sending a PR?

@davinnovation
Copy link
Contributor

As neerlr11 mentioned, 0.6.0-modelcallback didn't work as documented: supporting dynamic checkpoint string like filepath='{epoch:02d}-{val_loss:.2f}.hdf5' - waiting for update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants