Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_best,-save_all,-save_last ++ bug fix #416

Closed
wants to merge 10 commits into from
Closed

save_best,-save_all,-save_last ++ bug fix #416

wants to merge 10 commits into from

Conversation

ceyzaguirre4
Copy link
Contributor

@ceyzaguirre4 ceyzaguirre4 commented Oct 23, 2019

Before submitting

What does this PR do?

Implements #285.

@ceyzaguirre4 ceyzaguirre4 marked this pull request as ready for review October 24, 2019 02:10
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bool paradigm makes the parameter list too long and also the name convention of saved models needs discussion

if self.save_all or self.save_last:
overwrite = not self.save_all
filepath = '{}/{}_ckpt_epoch_{}.ckpt'.format(self.filepath, self.prefix, epoch + 1)
self.save_model(filepath, overwrite=overwrite, prefix=self.prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the prefix used twice - here and also in the file name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelCheckpoint.save_model had a bug (#394) where the whole directory is wiped. Passing self.prefix to the method allows me to then identify the files that need deletion by using re (re.match(r'{}_ckpt_epoch_\d+.ckpt'.format(prefix), filename))

@Borda
Copy link
Member

Borda commented Oct 25, 2019

see #128

@ceyzaguirre4
Copy link
Contributor Author

Ok, yes I saw it
Three options:

@Borda
Copy link
Member

Borda commented Oct 27, 2019

@williamFalcon which option do you prefer?
it is not about who is faster, but which philosophy to adopt

  • [yours] separate saving best/all/last with name differentiation
  • [other] saving N last/best models which look more general

@williamFalcon
Copy link
Contributor

williamFalcon commented Nov 3, 2019

@ceyzaguirre4 this is awesome. I do agree that the supported cases should be:

  1. n last
  2. n best

Do we really need all? wouldn't people just set n=huge_number?

@Borda
Copy link
Member

Borda commented Nov 3, 2019

I would prefer N best...

@williamFalcon
Copy link
Contributor

n last may not be the n best

@Borda
Copy link
Member

Borda commented Nov 3, 2019

I know, but what would it be the case you want to keep n last even they are quite bad?
I understand to keel all metrics abut not bad models... :)

@williamFalcon
Copy link
Contributor

this is more of a research question. i can’t predict how people will use this so it needs to be general.

there’s at least one case i know where you need the n last: ie some mode that looks at progress over time or if you want to plot the change in some output over time... then you need those checkpoints.

@Ir1d
Copy link
Contributor

Ir1d commented Nov 4, 2019

@williamFalcon Hi sorry for being absent in the previous days. Should I get #128 ready asap or should I wait for this PR to land?
Should I revert the edits to save_model function in my PR?

@williamFalcon
Copy link
Contributor

@ceyzaguirre4 sorry for the delay. Could you resolve the conflicts so we can merge into this release?

@williamFalcon
Copy link
Contributor

williamFalcon commented Nov 5, 2019

@Ir1d let's get this one merged and you can finish yours based on this.

@Ir1d
Copy link
Contributor

Ir1d commented Nov 5, 2019

@williamFalcon shall we merge 128 first? since it has been ready since yesterday to catch up the next release

@Borda
Copy link
Member

Borda commented Nov 6, 2019

@williamFalcon it is probably only up to you to do the rebase, #320 (comment)
so far I know only PR author (@ceyzaguirre4), repo owner (@williamFalcon) or maintainers have access to the PR to push... :)

@ceyzaguirre4
Copy link
Contributor Author

@ceyzaguirre4 sorry for the delay. Could you resolve the conflicts so we can merge into this release?

Sorry for being unavailable, I'm presenting a paper at CVPR next week and wont have much time until after then.


@williamFalcon shall we merge 128 first? since it has been ready since yesterday to catch up the next release

A: I suggest doing this so you are not held up by my deadlines.


I know, but what would it be the case you want to keep n last even they are quite bad?
I understand to keel all metrics abut not bad models... :)

@Borda I find it crucial to at least offer the option to save the very last model. This is so as to reduce as much as possible the negative effects of (for example) an energy outage. Saving the last model is more of an emergency precaution than a legitimate way of selecting the best model.

@ceyzaguirre4
Copy link
Contributor Author

@williamFalcon in theory the conflicts are solved
However, more tests (those I generally run locally) will have to wait until after the call for papers of CVPR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants