Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint stateful handlers and metrics #966

Open
amatsukawa opened this issue Apr 22, 2020 · 3 comments
Open

Checkpoint stateful handlers and metrics #966

amatsukawa opened this issue Apr 22, 2020 · 3 comments

Comments

@amatsukawa
Copy link
Contributor

🚀 Feature

Things that are attached to the Engine might have state that would ideally be checkpointed and restored using as part of the Engine's state_dict.

An example is a Checkpoint handler when it has a score_function. Currently, the priorities the Checkpoint class stores is not saved anywhere. It is not able to recover gracefully from failure without manual intervention to parse the checkpoint path names, and directly setting the internals of the class.

Handlers and Metrics should have state_dict and load_state_dict methods (empty by default), and I think it should be possible for these to automatically make it into/restored from the Engine's state_dict when it's attached to an Engine.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 22, 2020

@amatsukawa thanks for FR! Yes, it definitely makes sense for handlers and metrics with internal state 👍

@amatsukawa
Copy link
Contributor Author

FWIW, I'm completely happy with #1156 and the way things work now.

With complications I didn't think about, eg. some handlers needing to go on the valid engine and handlers needing to run in a specific order (Checkpoint needs to run last after all other stateful handlers) perhaps automatically adding things to the engine's checkpoints is not trivial and might make things harder to reason about.

@vfdev-5 vfdev-5 added PyDataGlobal PyData Global 2020 Sprint and removed Hacktoberfest PyDataGlobal PyData Global 2020 Sprint labels Oct 31, 2020
@vfdev-5 vfdev-5 added the module: metrics Metrics module label Jan 18, 2021
@H4dr1en
Copy link
Contributor

H4dr1en commented Jun 23, 2021

Bringing here a specific use case that this FR could solve:

Having the RunningAverage metric being able to restore the state would allow to not "forget" previous iterations scores when resuming an experiments. Otherwise the metric can show "peaks" when resuming an experiment, as can be shown in the figure below:
newplot(3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants