-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lazy checkpoint loading for FSDP full-state checkpoints #18150
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Have you thought about other places in the codebase where we could lazy load?
We can now import our utility from lit-gpt
It looks like it is difficult to integrate it into the rest of the code base. Several things are standing in the way that require us to brainstorm on design decisions. Here are some raw notes from my notebook:
Happy to open an issue about this for further discussion!
There might be slight differences between the version here and in lit-gpt. I need to double check, there was something about quantization. |
# TODO: needed for us? | ||
# materializing with contiguous is needed for quantization | ||
if name in {"contiguous"}: | ||
return getattr(self._load_tensor(), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably keep this here if we want to import and use the util in lit-gpt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be a generally applicable limitation. On the other hand, we could do this manually in the quantization code directly.
What do you suggest @t-vi?
What does this PR do?
Addresses (2) in #18008 (comment)
Closes #18138
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
cc @Borda @justusschock @awaelchli @carmocca