-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistake in parameters' grad norm tracking #2012
Conversation
If necessary there is a patch to this fix that would enable support of infinity-norm computation:
|
@ivannz excellent! mind adding a test to make sure we're matching what numpy expects? |
c0332c6
to
944bbd6
Compare
I can add a unit test, but I have a question. Since the latest code relies completely on |
Codecov Report
@@ Coverage Diff @@
## master #2012 +/- ##
======================================
Coverage 88% 88%
======================================
Files 74 74
Lines 4666 4664 -2
======================================
- Hits 4084 4083 -1
+ Misses 582 581 -1 |
yes, but we need to make sure the overall result of this gradient clipping and the norm calculation are correct... we've addressed this issue multiple times already and at this point we just need more rigorous testing on this so that we don't have to revisit... |
I have added a test, @williamFalcon . Due to explicit rounding to 3 places in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, good work :)
The norm computation is correct, thanks for the fix.
I think the test can be simplified, see my comments.
To reflect similarity in spirit with clip_grad_norm I added support for an explicit 'inf' string in trainer's |
@awaelchli and @Borda thank you for the code review! @Borda Thank you for your edits! As for your doc string style fix, I apologise for not having read |
that is fine, we thank you for your contribution :] |
* fix grad norm formula * grad-norm tracker test * fixed seed and explicit rtol in grad norm tracking test * a docstring for grad-norms and forced cast to float of norm_type * support for inf-norm * renamed the grad norm test * docs * fixed language in docstring * Apply suggestions from code review Co-authored-by: Jirka <[email protected]> Co-authored-by: Jirka Borovec <[email protected]>
There is a mistake in grad_norms in
core.grads.GradInformation
. The mistake affects the reported gradient norm for every individual parameters, but not the total norm. This PR fixes the erroneous computation.According to torch docs
torch.norm(tensor, p)
computes the vector norm: either 2-norm, orsum(abs(x)**p)**(1./p)
if p != 2, and flattens the tensor if required. Nowp.grad.data.norm
in grads.py#L17 is the same astorch.norm(p.grad.data, ...)
, and thus computes the norm. However on grads.py#L19 grad tracker takes thep-th
root again, essentially makingparams_norm
equal tosum(abs(x)**p)**(1./(p**2)) = norm**(1./p)
which is not correct.The following snippet, which borrows code from grads.py#L17-L19
produces
which is clearly incorrect.