-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for async checkpointing #13658
Conversation
after discussion with @carmocca the design class AsyncCheckpointIO(CheckpointIO):
def __init__(self, checkpoint_io=None):
self._checkpoint_io = checkpoint_io
@setter
def checkpoint_io(...):
...
def save_checkpoint(self, checkpoint, filepath)
async_launch(self._checkpoint_io.save_checkpoint, checkpoint, filepath) and in the Trainer, the user can just pass Trainer(plugins=[AsyncCheckpointIO()]) we will attach the appropriate checkpoint if not already attached. This will enable async checkpointing for existing plugins by providing async behavior without any code modification. |
2bdd8e9
to
5f32d5b
Compare
94c6546
to
b2de2e6
Compare
Codecov Report
@@ Coverage Diff @@
## master #13658 +/- ##
=========================================
- Coverage 86% 76% -10%
=========================================
Files 330 332 +2
Lines 25973 26048 +75
=========================================
- Hits 22266 19808 -2458
- Misses 3707 6240 +2533 |
Great work @rohitgr7!! |
Thanks @awaelchli @carmocca @otaj for great reviews and discussion. |
What does this PR do?
Fixes #11561
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃
cc @Borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj @akihironitta