-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the gradient_clip_algorithm
has no effect issue.
#6928
Conversation
Also add a temporay workaround to Lightning-AI#6807
Hello @ceshine! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-10 15:10:55 UTC |
Codecov Report
@@ Coverage Diff @@
## master #6928 +/- ##
=======================================
- Coverage 92% 86% -5%
=======================================
Files 194 194
Lines 12347 12553 +206
=======================================
- Hits 11322 10856 -466
- Misses 1025 1697 +672 |
The main test that is failing now is The #6807 issue is really interfering with the testing inside |
tests/trainer/test_trainer.py
Outdated
assert abs(round(grad_max.item(), 6) - grad_clip_val) < 1e-6, \ | ||
f"Gradient max value {grad_max} != grad_clip_val {grad_clip_val} ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rare cases, this test will be a problem if gradient values are all smaller than the gradient clipping values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm aware. I'd argue that the possibility of that happens is really low (since the threshold is 1e-5). At least I haven't encountered one in my local testing. If you want to make it even lower, we can set the threshold to 1e-10 or 1e-13 (within the fp16 range).
This is the way I can think of to distinguish between clipping by norm and clipping by value. I'm open to better ideas, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to prevent that false positive case to happen, we can add an if statement before that assertion to make sure the minimum gradient is larger than the threshold (this might create some false negatives, though).
EDIT: this solution would need to get the gradients before clipping, which is not possible in the current test setup.
tests/trainer/test_trainer.py
Outdated
assert abs(round(grad_max.item(), 6) - grad_clip_val) < 1e-6, \ | ||
f"Gradient max value {grad_max} != grad_clip_val {grad_clip_val} ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
I've changed the clipping threshold in the test cases to 1e-10 and I don't think false positives (test case failed where it shouldn't) would happen in this setup. Even if one does, I'd argue that the problem is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
gradient_clip_algorithm
has no effect issue. (#6920)gradient_clip_algorithm
has no effect issue.
Hi @ceshine, quick question
Is there any reason why we can't just use |
I'm not really familiar with XLA, but I think it is the same reason behind the use of The original #6123 implementation did not even have a |
What does this PR do?
It contains some necessary changes to make
gradient_clip_algorithm
actually work (fixes #6920).Also, I added a temporary workaround to #6807 to make the test cases work. (I can remove it to make this PR does only one thing.)EDIT: removed this workaround for now since four errors that are outside of this PR had spun out.Notes:
TPUAccelerator.clip_gradients
does not implement clipping by value. Passinggradient_clip_algorithm="value"
should raise an exception.test_gradient_clipping_by_value
andtest_gradient_clipping_by_value_fp16
. They now clip the gradients to a maximum of 1e-5, and check if the maximum gradient value in the result is almost equal to 1e-5 (this threshold is small enough, so there should always be some gradients before clipping that are larger than this threshold).Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃