-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge work @pacman100 ! 🚀
Overall it looks great on quantization end, I left few nits - i've also left an open question about accelerate min version, perhaps just a warning for users should suffice as upgrading accleerate to 0.26.0 might be too brutal for users
Co-Authored-By: Younes Belkada <[email protected]>
@younesbelkada, addressed the comments in the latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge work @pacman100 ! thanks very much for this ! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Looks great! Added a small suggestion to simplify things a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work - thanks for adding this!
Just two small comments / questions
model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype) | ||
) | ||
else: | ||
hf_quantizer.create_quantized_param(model, param, key, "cpu", state_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I've understood - we don't need this anymore as we move creating quantized params when loading in the state dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hf_quantizer is None
, i.e., not quantized will always be False as the conditional logic if is_fsdp_enabled() and not is_local_dist_rank_0() and not is_quantized
has this and as such this inner conditional is no longer required. The quantized parameters are initilaized in the else logic on line 3950
@@ -1958,7 +1973,8 @@ def _get_resized_lm_head( | |||
if new_num_tokens is None: | |||
return old_lm_head | |||
|
|||
if is_deepspeed_zero3_enabled(): | |||
is_quantized = hasattr(self, "hf_quantizer") and self.hf_quantizer is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we couldn't have a is_quantized
property of the model, which is by default False
? Having to constantly define is_quantized
within the methods isn't ideal, as it requires updating in many different places if the criteria change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have property hf_quantizer
for the model, storing is_quantized
would duplicate the same information. Earlier, I was directly using self.hf_quantizer is None
in checks but there were suggestions above to improve readability using is_quantized
, and as such, I made the changes accordingly.
Co-Authored-By: Zach Mueller <[email protected]>
Really amazing work @pacman100 Thanks so much for this! ❤️ |
* fsdp+qlora related changes * fixes * Update quantization_config.py * support fsdp+qlora and dsz3+qlora * Update quantization_config.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * handle fsdp+qlora and dsz3+qlora correctly while model loading * fix param count * quality * fsdp related changes * fsdp changes only when using LoRA/QLoRA * add accelerate version check * refactor, update min accelerate version and add tests 1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict * fix test * Address comments Co-Authored-By: Younes Belkada <[email protected]> * fix the conditional flag * fix conditional flag * address comments Co-Authored-By: Zach Mueller <[email protected]> --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Zach Mueller <[email protected]>
What does this PR do?
bitsandbytes
has now addedquant_storage
option for 4-bit parameters which is required for FSDP+QLoRA support. This PR adds that option and fixes the corresponding parameter calculation.zero.init
when using DeepSpeed with QLoRA.This PR should be merged after Accelerate PR huggingface/accelerate#2544.