-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Cleanup ModelCheckpoint / EarlyStopping by moving logic to LoggerConnector #5218
Conversation
* add test * resolve bug * udpate test * wrongly copy / paste * update test * resolve a second bug Co-authored-by: Ubuntu <[email protected]>
Hello @tchaton! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-01-07 14:02:34 UTC |
Codecov Report
@@ Coverage Diff @@
## release/1.2-dev #5218 +/- ##
================================================
+ Coverage 89% 93% +4%
================================================
Files 146 147 +1
Lines 10361 10423 +62
================================================
+ Hits 9220 9656 +436
+ Misses 1141 767 -374 |
this is based on the 1.2 branch but unfortunately the Metrics code that needs cleanup is on master, and so this model checkpoint code will now diverge |
Yes, I started a previous PR from master. But after reflection, it is better we don't introduce this kind of refactoring during Christmas vacation. I will move this back to master in JAN, when the entire team is back. Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looking good.
@tchaton just to clarify, this is only refactor and no new features right?
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Show resolved
Hide resolved
Hey @SkafteNicki, it is mainly a refactor + a small feature (automatic casting when user access a metric dictionary). Reason: Users had to add conversion check within callbacks which they shouldn't have to. This PR adds automatic conversion only on accessing properties from the trainer. For performance reasons, I moved all the metrics to private, so they don't trigger the conversion. Best, |
Hey @SkafteNicki, It enables having all the metrics displayed in the same format. Before we had a mix between tensors and numbers. |
pytorch_lightning/trainer/connectors/logger_connector/metrics_holder.py
Outdated
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Outdated
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just one minor question regarding device.
@@ -605,7 +602,7 @@ def _update_best_and_save( | |||
del_list.append(delpath) | |||
|
|||
# do not save nan, replace with +/- inf | |||
if torch.isnan(current): | |||
if isinstance(current, torch.Tensor) and torch.isnan(current): | |||
current = torch.tensor(float('inf' if self.mode == "min" else '-inf')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this always be on cpu? or should it be on current.device
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question ! I am not sure. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it should live on current.device, since all the other tensors (especially current if not nan) also live on this device.
pytorch_lightning/trainer/connectors/logger_connector/metrics_holder.py
Outdated
Show resolved
Hide resolved
…holder.py Co-authored-by: Roger Shieh <[email protected]>
What does this PR do?
This PR introduces MetricsHolder class which is responsible to hold metrics and their convert functions.
Fixes # (issue)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃