Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics] Allow for running accumulation computations #5193

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented Dec 19, 2020

What does this PR do?

Fixes #5166
Fixes #4641

Multiple users have requested that it should be able to call compute() method of a metric without reset() being called. The argument is that this allows for computing accumulated metrics in a running fashion. This PR adds a new flag to all metrics called auto_reset_on_compute that as default is True (current behavior) but when set to False disables the automatic reset() call after compute. It is then up to the user to manually call reset() when they want the metric states reset.

When this new flag is False and the metric is called in ddp mode, we will still reset the metric states on all processes != 0. This is to make sure that we do not over-accumulate the states.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

update
fix
fix
fix
fix
fix
add flag to metrics
@pep8speaks
Copy link

pep8speaks commented Dec 19, 2020

Hello @SkafteNicki! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-06 20:31:19 UTC

@codecov
Copy link

codecov bot commented Dec 19, 2020

Codecov Report

Merging #5193 (d02499a) into release/1.2-dev (9dbdffc) will decrease coverage by 9%.
The diff coverage is 100%.

@@                Coverage Diff                @@
##           release/1.2-dev   #5193     +/-   ##
=================================================
- Coverage               93%     84%     -9%     
=================================================
  Files                  146     146             
  Lines                10211   11865   +1654     
=================================================
+ Hits                  9486    9980    +494     
- Misses                 725    1885   +1160     

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a comment unrelated to this PR - I would prefer to have the Args section in __init__ to be closer to where it is actually used...

@Borda Borda added the ready PRs ready to be merged label Jan 6, 2021
if self.auto_reset_on_compute or not self._internal_reset:
reset_fn = self._reset_to_default
else:
reset_fn = all_except_rank_zero(self._reset_to_default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make it not reset on rank 0? Why would we not want to reset on rank 0?

@teddykoker
Copy link
Contributor

teddykoker commented Jan 6, 2021

Another possible solution here is to simply never reset() on compute(). Initially when I wrote the code I thought this would make sense as most of the time you would want to call both, but perhaps the verbosity is better.

PyTorch has a similar pattern with optimizer.step(), optimizer.zero_grad(). Most of the time you want to just call both, but it makes sense to have the user call both manually.

When logging a metric we could still call both in the logger.

@SkafteNicki, @Borda, @justusschock what do you think? The api is still relatively new, and I think this would make more sense for users that want to use metrics with plain PyTorch.

@teddykoker
Copy link
Contributor

We might want to do something like this for _wrap_compute:

 def _wrap_compute(self, compute):
        @functools.wraps(compute)
        def wrapped_func(*args, **kwargs):
            # return cached value
            if self._computed is not None:
                return self._computed

            dist_sync_fn = self.dist_sync_fn
            if (dist_sync_fn is None
                    and torch.distributed.is_available()
                    and torch.distributed.is_initialized()):
                # User provided a bool, so we assume DDP if available
                dist_sync_fn = gather_all_tensors
           
            synced = False
            if self._to_sync and dist_sync_fn is not None:              
                # ADDED: cache prior to syncing
                self._cache = {attr: getattr(self, attr) for attr in self._defaults.keys()}
                
                # sync (this was already there)
                self._sync_dist(dist_sync_fn)
                synced = True

            self._computed = compute(*args, **kwargs)
            if synced:
                 # ADDED: if we synced, restore to cache
                for attr, val in self._cache.items():
                    setattr(self, attr, val)
           
            # NO LONGER RESET
            # self.reset()
            return self._computed
     return wrapped_func

Note how we cache the state if syncing, and restore after syncing, so that the accumulation will not be messed up.

@SkafteNicki
Copy link
Member Author

@teddykoker personally I would prefer that solution. I only proposed this way, such that it did not break the current setup.
Let me close this PR, and then we implement your suggestion in another PR together with a warning in the docs that user should note that prior to v1.2 that compute also calls reset.

@SkafteNicki SkafteNicki closed this Jan 6, 2021
@SkafteNicki SkafteNicki deleted the metrics/reset_option branch March 2, 2021 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants