save_best,-save_all,-save_last ++ bug fix #416

ceyzaguirre4 · 2019-10-23T01:25:23Z

Before submitting

Was this discussed/approved via a Github issue? Yes, Checkpoint should support two types of checkpoints, last and best #285
Did you read the contributor guideline? YES
Did you make sure to update the docs? YES

What does this PR do?

Implements #285.

Borda

the bool paradigm makes the parameter list too long and also the name convention of saved models needs discussion

pytorch_lightning/callbacks/pt_callbacks.py

Borda · 2019-10-24T15:38:47Z

pytorch_lightning/callbacks/pt_callbacks.py

+            if self.save_all or self.save_last:
+                overwrite = not self.save_all
+                filepath = '{}/{}_ckpt_epoch_{}.ckpt'.format(self.filepath, self.prefix, epoch + 1)
+                self.save_model(filepath, overwrite=overwrite, prefix=self.prefix)


why is the prefix used twice - here and also in the file name?

ModelCheckpoint.save_model had a bug (#394) where the whole directory is wiped. Passing self.prefix to the method allows me to then identify the files that need deletion by using re (re.match(r'{}_ckpt_epoch_\d+.ckpt'.format(prefix), filename))

pytorch_lightning/callbacks/pt_callbacks.py

…ll,-save_last

Borda · 2019-10-25T09:07:09Z

see #128

ceyzaguirre4 · 2019-10-27T21:33:05Z

Ok, yes I saw it
Three options:

we merge this PR and when change Checkpoint callback's save_best_only to save_top_k #128 is ready they merge too.
I wait until change Checkpoint callback's save_best_only to save_top_k #128 is ready (next month hopefully), rebase and add my changes to that.
(- I rebase to current state of change Checkpoint callback's save_best_only to save_top_k #128 and try to fix the bugs there)

Borda · 2019-10-27T21:49:11Z

@williamFalcon which option do you prefer?
it is not about who is faster, but which philosophy to adopt

[yours] separate saving best/all/last with name differentiation
[other] saving N last/best models which look more general

williamFalcon · 2019-11-03T10:28:35Z

@ceyzaguirre4 this is awesome. I do agree that the supported cases should be:

n last
n best

Do we really need all? wouldn't people just set n=huge_number?

Borda · 2019-11-03T11:45:19Z

I would prefer N best...

williamFalcon · 2019-11-03T12:28:22Z

n last may not be the n best

Borda · 2019-11-03T12:31:04Z

I know, but what would it be the case you want to keep n last even they are quite bad?
I understand to keel all metrics abut not bad models... :)

williamFalcon · 2019-11-03T13:26:42Z

this is more of a research question. i can’t predict how people will use this so it needs to be general.

there’s at least one case i know where you need the n last: ie some mode that looks at progress over time or if you want to plot the change in some output over time... then you need those checkpoints.

Ir1d · 2019-11-04T05:15:12Z

@williamFalcon Hi sorry for being absent in the previous days. Should I get #128 ready asap or should I wait for this PR to land?
Should I revert the edits to save_model function in my PR?

williamFalcon · 2019-11-05T14:21:07Z

@ceyzaguirre4 sorry for the delay. Could you resolve the conflicts so we can merge into this release?

williamFalcon · 2019-11-05T14:31:36Z

@Ir1d let's get this one merged and you can finish yours based on this.

Ir1d · 2019-11-05T15:58:41Z

@williamFalcon shall we merge 128 first? since it has been ready since yesterday to catch up the next release

Borda · 2019-11-06T00:22:32Z

@williamFalcon it is probably only up to you to do the rebase, #320 (comment)
so far I know only PR author (@ceyzaguirre4), repo owner (@williamFalcon) or maintainers have access to the PR to push... :)

ceyzaguirre4 · 2019-11-08T19:53:33Z

@ceyzaguirre4 sorry for the delay. Could you resolve the conflicts so we can merge into this release?

Sorry for being unavailable, I'm presenting a paper at CVPR next week and wont have much time until after then.

@williamFalcon shall we merge 128 first? since it has been ready since yesterday to catch up the next release

A: I suggest doing this so you are not held up by my deadlines.

I know, but what would it be the case you want to keep n last even they are quite bad?
I understand to keel all metrics abut not bad models... :)

@Borda I find it crucial to at least offer the option to save the very last model. This is so as to reduce as much as possible the negative effects of (for example) an energy outage. Saving the last model is more of an emergency precaution than a legitimate way of selecting the best model.

ceyzaguirre4 · 2019-11-08T19:58:12Z

@williamFalcon in theory the conflicts are solved
However, more tests (those I generally run locally) will have to wait until after the call for papers of CVPR

ceyzaguirre4 added 2 commits October 22, 2019 22:27

save_best,-save_all,-save_last ++ bug fix

ac2f349

updated docs

e5d9d52

ceyzaguirre4 marked this pull request as ready for review October 24, 2019 02:10

Borda requested changes Oct 24, 2019

View reviewed changes

ceyzaguirre4 added 4 commits October 24, 2019 15:34

requested changes

550f633

Merge remote-tracking branch 'upstream/master' into save_best,-save_a…

2781d18

…ll,-save_last

updated tests and added additional ones

7ddfa2a

minor flake8 changes

4348f54

Ir1d mentioned this pull request Nov 4, 2019

change Checkpoint callback's save_best_only to save_top_k #128

Merged

williamFalcon added 2 commits November 5, 2019 09:34

Update pt_callbacks.py

c1935c6

Update test_a_restore_models.py

321f18e

Merge branch 'master' into save_best,-save_all,-save_last

d227d7b

removed duplicate requirement

bbedbf0

ceyzaguirre4 closed this Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save_best,-save_all,-save_last ++ bug fix #416

save_best,-save_all,-save_last ++ bug fix #416

ceyzaguirre4 commented Oct 23, 2019 •

edited

Loading

Borda left a comment

Borda Oct 24, 2019

ceyzaguirre4 Oct 24, 2019

Borda commented Oct 25, 2019

ceyzaguirre4 commented Oct 27, 2019

Borda commented Oct 27, 2019

williamFalcon commented Nov 3, 2019 •

edited

Loading

Borda commented Nov 3, 2019

williamFalcon commented Nov 3, 2019

Borda commented Nov 3, 2019

williamFalcon commented Nov 3, 2019

Ir1d commented Nov 4, 2019

williamFalcon commented Nov 5, 2019

williamFalcon commented Nov 5, 2019 •

edited

Loading

Ir1d commented Nov 5, 2019

Borda commented Nov 6, 2019

ceyzaguirre4 commented Nov 8, 2019

ceyzaguirre4 commented Nov 8, 2019

save_best,-save_all,-save_last ++ bug fix #416

save_best,-save_all,-save_last ++ bug fix #416

Conversation

ceyzaguirre4 commented Oct 23, 2019 • edited Loading

Before submitting

What does this PR do?

Borda left a comment

Choose a reason for hiding this comment

Borda Oct 24, 2019

Choose a reason for hiding this comment

ceyzaguirre4 Oct 24, 2019

Choose a reason for hiding this comment

Borda commented Oct 25, 2019

ceyzaguirre4 commented Oct 27, 2019

Borda commented Oct 27, 2019

williamFalcon commented Nov 3, 2019 • edited Loading

Borda commented Nov 3, 2019

williamFalcon commented Nov 3, 2019

Borda commented Nov 3, 2019

williamFalcon commented Nov 3, 2019

Ir1d commented Nov 4, 2019

williamFalcon commented Nov 5, 2019

williamFalcon commented Nov 5, 2019 • edited Loading

Ir1d commented Nov 5, 2019

Borda commented Nov 6, 2019

ceyzaguirre4 commented Nov 8, 2019

ceyzaguirre4 commented Nov 8, 2019

ceyzaguirre4 commented Oct 23, 2019 •

edited

Loading

williamFalcon commented Nov 3, 2019 •

edited

Loading

williamFalcon commented Nov 5, 2019 •

edited

Loading