Add `log_grad_norm` hook to `LightningModule` #7873

carmocca · 2021-06-07T22:38:56Z

What does this PR do?

Part of #7631

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
[n/a] Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

codecov · 2021-06-07T22:40:10Z

Codecov Report

Merging #7873 (db7864a) into master (ce97676) will decrease coverage by 0%.
The diff coverage is 96%.

@@          Coverage Diff           @@
##           master   #7873   +/-   ##
======================================
- Coverage      93%     92%   -0%     
======================================
  Files         202     202           
  Lines       13123   13114    -9     
======================================
- Hits        12156   12111   -45     
- Misses        967    1003   +36

awaelchli

great, already integrated in new loop!

ananthsub · 2021-06-08T04:36:02Z

pytorch_lightning/core/lightning.py

@@ -440,6 +440,20 @@ def __sync(
    def __check_none(name: str, value: Any, _) -> Any:
        raise ValueError(f'`self.log({name}, {value})` was called, but `None` values cannot be logged')

+    def log_grad_norm(self, grad_norm_dict: Dict[str, torch.Tensor]) -> None:


is adding a new hook the only option here? his feels very strange that this only controls the log behavior, and not the calculation. what's the relationship between this and on_after_bacwkard?

Hey @ananthsub. Users wanted to customise how they aggregate the grad_norm and log them.
Here, they are free to:

Change reduce_fx

Compute extra values such as standard deviation or total mean norm.

Perform custom aggregation.

on_after_bacwkard is independent for this function.

pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

ananthsub · 2021-06-08T04:42:35Z

pytorch_lightning/core/lightning.py

+            def log_grad_norm(self, grad_norm_dict):
+                self.log_dict(grad_norm_dict, on_step=False, on_epoch=True, prog_bar=False, logger=True)
+        """
+        self.log_dict(grad_norm_dict, on_step=True, on_epoch=True, prog_bar=True, logger=True)


while not BC, is prog_bar=False a better default?

@carmocca Any thoughts on this one ? I see value in both.

Pros for true:

Directly visible

Quick experimentation

Cons for true:

Usually too cluttered

I think I prefer to have it as True which seems to me like the most convenient option when manually debugging your model

tchaton

LGTM !

carmocca added the refactor label Jun 7, 2021

carmocca added this to the v1.4 milestone Jun 7, 2021

carmocca self-assigned this Jun 7, 2021

carmocca requested review from awaelchli, Borda, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners June 7, 2021 22:38

awaelchli approved these changes Jun 7, 2021

View reviewed changes

carmocca added 3 commits June 8, 2021 02:00

Add log_grad_norm hook to LightningModule

614410a

Fix tests

c3e07c3

Update CHANGELOG

db7864a

carmocca force-pushed the refactor/log-grad-norm branch from 607d61c to db7864a Compare June 8, 2021 00:01

carmocca enabled auto-merge (squash) June 8, 2021 00:01

carmocca added the ready PRs ready to be merged label Jun 8, 2021

ananthsub reviewed Jun 8, 2021

View reviewed changes

tchaton approved these changes Jun 8, 2021

View reviewed changes

tchaton mentioned this pull request Jun 8, 2021

Loop Refactor 1/N - Training Loop #7871

Merged

13 tasks

SeanNaren approved these changes Jun 8, 2021

View reviewed changes

carmocca merged commit 8cc55eb into master Jun 8, 2021

carmocca deleted the refactor/log-grad-norm branch June 8, 2021 11:09

carmocca mentioned this pull request Jan 14, 2022

Logging grad norm metrics to progress bar by default #11471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `log_grad_norm` hook to `LightningModule` #7873

Add `log_grad_norm` hook to `LightningModule` #7873

carmocca commented Jun 7, 2021

codecov bot commented Jun 7, 2021 •

edited

Loading

awaelchli left a comment

ananthsub Jun 8, 2021

tchaton Jun 8, 2021 •

edited

Loading

ananthsub Jun 8, 2021

tchaton Jun 8, 2021

carmocca Jun 8, 2021

tchaton left a comment

Add log_grad_norm hook to LightningModule #7873

Add log_grad_norm hook to LightningModule #7873

Conversation

carmocca commented Jun 7, 2021

What does this PR do?

Before submitting

PR review

codecov bot commented Jun 7, 2021 • edited Loading

Codecov Report

awaelchli left a comment

Choose a reason for hiding this comment

ananthsub Jun 8, 2021

Choose a reason for hiding this comment

tchaton Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

ananthsub Jun 8, 2021

Choose a reason for hiding this comment

tchaton Jun 8, 2021

Choose a reason for hiding this comment

carmocca Jun 8, 2021

Choose a reason for hiding this comment

tchaton left a comment

Choose a reason for hiding this comment

Add `log_grad_norm` hook to `LightningModule` #7873

Add `log_grad_norm` hook to `LightningModule` #7873

codecov bot commented Jun 7, 2021 •

edited

Loading

tchaton Jun 8, 2021 •

edited

Loading