-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(9/n) Support 2D Parallelism - Remaining Checkpoint Logic #19888
Conversation
33f47ef
to
f558f07
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19888 +/- ##
=========================================
- Coverage 84% 59% -25%
=========================================
Files 426 421 -5
Lines 35233 35149 -84
=========================================
- Hits 29506 20745 -8761
- Misses 5727 14404 +8677 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Luca Antiga <[email protected]>
for more information, see https://pre-commit.ci
What does this PR do?
Implements the remaining distributed checkpoint saving and loading logic to the
ModelParallelStrategy
for Trainer.The tests were adopted from the existing FSDP strategy tests.
📚 Documentation preview 📚: https://pytorch-lightning--19888.org.readthedocs.build/en/19888/
cc @Borda @awaelchli @carmocca @justusschock