-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharded state dicts save correctly when save_weights_only=True
#19524
Sharded state dicts save correctly when save_weights_only=True
#19524
Conversation
This works now in that the test fails on master and passes with the corrections. It could improve with verification that the checkpoints are useable in the lightning ecosystem downstream. If this is of interest, can I ask the lightning team for some help as I don't have a lot of free time to work on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri-voytan Great fix, thanks a lot!
I integrated your test in an existing one we already had for weights_only=True
, we just needed to add the parameterization for sharded checkpoints.
) Co-authored-by: Dimitri <[email protected]> Co-authored-by: awaelchli <[email protected]> Co-authored-by: Jirka Borovec <[email protected]>
) Co-authored-by: Dimitri <[email protected]> Co-authored-by: awaelchli <[email protected]> Co-authored-by: Jirka Borovec <[email protected]>
What does this PR do?
Fixes #19492 for FSDP sharded state_dicts. Optimizer states are default to an empty list if they are not in the state_dict, which can happen when the model checkpoint callback uses
save_weights_only=True
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--19524.org.readthedocs.build/en/19524/