Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Cleanup ModelCheckpoint / EarlyStopping by moving logic to LoggerConnector #5218

Merged
merged 19 commits into from
Jan 7, 2021

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Dec 21, 2020

What does this PR do?

This PR introduces MetricsHolder class which is responsible to hold metrics and their convert functions.

Fixes # (issue)

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

tchaton and others added 3 commits December 21, 2020 14:12
* add test

* resolve bug

* udpate test

* wrongly copy / paste

* update test

* resolve a second bug

Co-authored-by: Ubuntu <[email protected]>
@pep8speaks
Copy link

pep8speaks commented Dec 21, 2020

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-07 14:02:34 UTC

@tchaton tchaton changed the title CleanUp metrics [Feat] Cleanup ModelCheckpoint / EarlyStopping by moving logic to LoggerConnector Dec 21, 2020
@tchaton tchaton marked this pull request as ready for review December 21, 2020 15:00
@tchaton tchaton self-assigned this Dec 21, 2020
@tchaton tchaton added priority: 1 Medium priority task refactor labels Dec 21, 2020
@codecov
Copy link

codecov bot commented Dec 21, 2020

Codecov Report

Merging #5218 (b922c73) into release/1.2-dev (7464aca) will increase coverage by 4%.
The diff coverage is 92%.

@@               Coverage Diff                @@
##           release/1.2-dev   #5218    +/-   ##
================================================
+ Coverage               89%     93%    +4%     
================================================
  Files                  146     147     +1     
  Lines                10361   10423    +62     
================================================
+ Hits                  9220    9656   +436     
+ Misses                1141     767   -374     

@awaelchli
Copy link
Contributor

this is based on the 1.2 branch but unfortunately the Metrics code that needs cleanup is on master, and so this model checkpoint code will now diverge

@tchaton
Copy link
Contributor Author

tchaton commented Dec 22, 2020

1.2 branch but unfortunately the Metrics code that needs cleanup is on master, and so this model checkpoint code will now diverge

Yes, I started a previous PR from master. But after reflection, it is better we don't introduce this kind of refactoring during Christmas vacation. I will move this back to master in JAN, when the entire team is back.

Best,
T.C

Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looking good.
@tchaton just to clarify, this is only refactor and no new features right?

@tchaton
Copy link
Contributor Author

tchaton commented Dec 28, 2020

just to clarify, this is only refactor and no new features right?

Hey @SkafteNicki, it is mainly a refactor + a small feature (automatic casting when user access a metric dictionary).

Reason: Users had to add conversion check within callbacks which they shouldn't have to.

This PR adds automatic conversion only on accessing properties from the trainer. For performance reasons, I moved all the metrics to private, so they don't trigger the conversion.

Best,
T.C

@tchaton
Copy link
Contributor Author

tchaton commented Dec 28, 2020

Hey @SkafteNicki, It enables having all the metrics displayed in the same format. Before we had a mix between tensors and numbers.

Screenshot 2020-12-28 at 11 50 06

Ubuntu and others added 2 commits December 28, 2020 16:33
@tchaton tchaton requested a review from awaelchli December 28, 2020 16:40
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just one minor question regarding device.

@@ -605,7 +602,7 @@ def _update_best_and_save(
del_list.append(delpath)

# do not save nan, replace with +/- inf
if torch.isnan(current):
if isinstance(current, torch.Tensor) and torch.isnan(current):
current = torch.tensor(float('inf' if self.mode == "min" else '-inf'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this always be on cpu? or should it be on current.device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question ! I am not sure. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should live on current.device, since all the other tensors (especially current if not nan) also live on this device.

@tchaton tchaton enabled auto-merge (squash) January 4, 2021 12:25
@tchaton tchaton added the ready PRs ready to be merged label Jan 5, 2021
@tchaton tchaton disabled auto-merge January 5, 2021 13:10
@tchaton tchaton enabled auto-merge (squash) January 5, 2021 13:10
@tchaton tchaton merged commit 5f94900 into release/1.2-dev Jan 7, 2021
@tchaton tchaton deleted the cleanup/metrics_2 branch January 7, 2021 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: 1 Medium priority task ready PRs ready to be merged refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants