-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save_best,-save_all,-save_last ++ bug fix #416
save_best,-save_all,-save_last ++ bug fix #416
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bool paradigm makes the parameter list too long and also the name convention of saved models needs discussion
if self.save_all or self.save_last: | ||
overwrite = not self.save_all | ||
filepath = '{}/{}_ckpt_epoch_{}.ckpt'.format(self.filepath, self.prefix, epoch + 1) | ||
self.save_model(filepath, overwrite=overwrite, prefix=self.prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the prefix used twice - here and also in the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ModelCheckpoint.save_model
had a bug (#394) where the whole directory is wiped. Passing self.prefix
to the method allows me to then identify the files that need deletion by using re (re.match(r'{}_ckpt_epoch_\d+.ckpt'.format(prefix), filename)
)
see #128 |
Ok, yes I saw it
|
@williamFalcon which option do you prefer?
|
@ceyzaguirre4 this is awesome. I do agree that the supported cases should be:
Do we really need all? wouldn't people just set n=huge_number? |
I would prefer N best... |
n last may not be the n best |
I know, but what would it be the case you want to keep n last even they are quite bad? |
this is more of a research question. i can’t predict how people will use this so it needs to be general. there’s at least one case i know where you need the n last: ie some mode that looks at progress over time or if you want to plot the change in some output over time... then you need those checkpoints. |
@williamFalcon Hi sorry for being absent in the previous days. Should I get #128 ready asap or should I wait for this PR to land? |
@ceyzaguirre4 sorry for the delay. Could you resolve the conflicts so we can merge into this release? |
@Ir1d let's get this one merged and you can finish yours based on this. |
@williamFalcon shall we merge 128 first? since it has been ready since yesterday to catch up the next release |
@williamFalcon it is probably only up to you to do the rebase, #320 (comment) |
Sorry for being unavailable, I'm presenting a paper at CVPR next week and wont have much time until after then.
A: I suggest doing this so you are not held up by my deadlines.
@Borda I find it crucial to at least offer the option to save the very last model. This is so as to reduce as much as possible the negative effects of (for example) an energy outage. Saving the last model is more of an emergency precaution than a legitimate way of selecting the best model. |
@williamFalcon in theory the conflicts are solved |
Before submitting
What does this PR do?
Implements #285.