-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalize closure api in Lightning #8642
Conversation
Hello @awaelchli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-13 12:30:53 UTC |
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #8642 +/- ##
=======================================
- Coverage 92% 88% -4%
=======================================
Files 175 176 +1
Lines 14696 14741 +45
=======================================
- Hits 13508 12941 -567
- Misses 1188 1800 +612 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on memory and two really minor ones on naming :)
Co-authored-by: Carlos Mocholí <[email protected]>
bc52030
to
592daa8
Compare
9ce755f
to
3137123
Compare
This reverts commit 296f4f9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great !
# check if loss or model weights are nan | ||
if self.trainer.terminate_on_nan: | ||
check_finite_loss(self.trainer.lightning_module, loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli since this finite check was added here, can we remove _process_closure_result
? (just below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think the way it is right now, it is redundant and can be removed. However, looking at it right now, the NaN checks are not the way they are supposed to be.
The intention of terminate_on_nan is to check the loss BEFORE backward applies, and to check the weights AFTER backward. this must have been messed up in some of our loop refactors :(
I will address this
What does this PR do?
Proposal for a general API for handling closures in Lightning.
Motivation:
Depending on how involved a loop customization is, sooner or later a user has to deal with closures at some point. This is quite difficult in Lightning currently due to a high entanglement with gradient accumulation, manual backward, precision etc. By extracting out a high level api for building closures, we enable customization while providing a clean implementation for manual optimization etc.
Proposed Solution
A Lightning Closure with the following responsibilities:
A Lightning Closure gets created inside our training loop like so (pseudo code):
and then later on it gets passed to the optimizer after which we can access results:
With the help of the closure abstraction, it is now very easy to implement manual optimization by setting
Result
Final Thoughts
This PR adds an abstraction. Lightning in its philosophy doesn't want to put abstraction in the way of a user, and we don't want users to fight against our abstractions. The introduction of LightningClosure doesn't contradict that. It is mainly motivated by internal use and documentation. It's a self-contained component that I believe offers great insight for any code readers and explorers looking into how Lightning works with optimizers and closures.
Fixes #9129
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃