Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587

pacman100 · 2024-03-11T11:56:51Z

What does this PR do?

bitsandbytes has now added quant_storage option for 4-bit parameters which is required for FSDP+QLoRA support. This PR adds that option and fixes the corresponding parameter calculation.
Enables loading the model layer by layer, quantising it on GPU and then transferring it to CPU so that FSDP or DeepSpeed can later shard it and put it on the GPUs. This reduces the GPU memory required. For example, a 70B model when loaded in 4-bit on each GPU without sharding would require 35Gb per GPU. However, if we load the model on CPU and shard it on 2 GPUs, then the same quantized model would require 35/2=17.5 GB per GPU which now can fit on 24GB GPUs.
Dispatch needs to be disabled else it will try to put the quantized weights which are on CPU to GPu before sharding.
Disable zero.init when using DeepSpeed with QLoRA.
When using FSDP with PEFT LoRA, the auto wrap policy needs to be updated to additionally separately wrap LoRA trainable layers. When using FSDP with QLoRA, the mixed precision policy needs to be updated to use the quantization storage data type.

This PR should be merged after Accelerate PR huggingface/accelerate#2544.

HuggingFaceDocBuilderDev · 2024-03-11T12:19:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict

younesbelkada

Huge work @pacman100 ! 🚀
Overall it looks great on quantization end, I left few nits - i've also left an open question about accelerate min version, perhaps just a warning for users should suffice as upgrading accleerate to 0.26.0 might be too brutal for users

setup.py

src/transformers/modeling_utils.py

src/transformers/utils/import_utils.py

Co-Authored-By: Younes Belkada <[email protected]>

pacman100 · 2024-03-13T07:28:17Z

@younesbelkada, addressed the comments in the latest commit

younesbelkada

Huge work @pacman100 ! thanks very much for this ! 🚀

muellerzr

Thanks for adding this! Looks great! Added a small suggestion to simplify things a bit

src/transformers/training_args.py

tests/fsdp/test_fsdp.py

amyeroberts

Awesome work - thanks for adding this!

Just two small comments / questions

amyeroberts · 2024-03-13T14:10:08Z

src/transformers/modeling_utils.py

-                                        model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype)
-                                    )
-                                else:
-                                    hf_quantizer.create_quantized_param(model, param, key, "cpu", state_dict)


Just to make sure I've understood - we don't need this anymore as we move creating quantized params when loading in the state dict?

hf_quantizer is None, i.e., not quantized will always be False as the conditional logic if is_fsdp_enabled() and not is_local_dist_rank_0() and not is_quantized has this and as such this inner conditional is no longer required. The quantized parameters are initilaized in the else logic on line 3950

amyeroberts · 2024-03-13T14:13:08Z

src/transformers/modeling_utils.py

@@ -1958,7 +1973,8 @@ def _get_resized_lm_head(
        if new_num_tokens is None:
            return old_lm_head

-        if is_deepspeed_zero3_enabled():
+        is_quantized = hasattr(self, "hf_quantizer") and self.hf_quantizer is not None


Is there any reason we couldn't have a is_quantized property of the model, which is by default False? Having to constantly define is_quantized within the methods isn't ideal, as it requires updating in many different places if the criteria change

we already have property hf_quantizer for the model, storing is_quantized would duplicate the same information. Earlier, I was directly using self.hf_quantizer is None in checks but there were suggestions above to improve readability using is_quantized, and as such, I made the changes accordingly.

Co-Authored-By: Zach Mueller <[email protected]>

Titus-von-Koeller · 2024-03-14T09:28:15Z

Really amazing work @pacman100 Thanks so much for this! ❤️

* fsdp+qlora related changes * fixes * Update quantization_config.py * support fsdp+qlora and dsz3+qlora * Update quantization_config.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * handle fsdp+qlora and dsz3+qlora correctly while model loading * fix param count * quality * fsdp related changes * fsdp changes only when using LoRA/QLoRA * add accelerate version check * refactor, update min accelerate version and add tests 1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict * fix test * Address comments Co-Authored-By: Younes Belkada <[email protected]> * fix the conditional flag * fix conditional flag * address comments Co-Authored-By: Zach Mueller <[email protected]> --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Zach Mueller <[email protected]>

pacman100 added 18 commits January 19, 2024 13:40

fsdp+qlora related changes

78d6dcc

fixes

2f51a20

Update quantization_config.py

5df8a65

Merge branch 'main' into smangrul/fsdp-qlora-support

595a16b

support fsdp+qlora and dsz3+qlora

00312cf

Update quantization_config.py

4511a2c

Update modeling_utils.py

574b371

Update modeling_utils.py

b195b28

Update modeling_utils.py

5081937

Update modeling_utils.py

3da40ee

Update modeling_utils.py

78e06e6

Update modeling_utils.py

32f8c83

handle fsdp+qlora and dsz3+qlora correctly while model loading

1401b73

fix param count

3c56930

quality

4840515

fsdp related changes

bef438b

Merge branch 'main' into smangrul/fsdp-qlora-support

4a6596b

fsdp changes only when using LoRA/QLoRA

ac6ddec

pacman100 mentioned this pull request Mar 11, 2024

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA huggingface/trl#1416

Merged

pacman100 added 3 commits March 12, 2024 15:41

add accelerate version check

e554934

refactor, update min accelerate version and add tests

4c82852

1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict

fix test

a43d49d

pacman100 mentioned this pull request Mar 12, 2024

fsdp qlora and dsz3 qlora pacman100/LLM-Workshop#28

Merged

pacman100 marked this pull request as ready for review March 12, 2024 14:03

pacman100 requested review from younesbelkada, muellerzr and amyeroberts March 12, 2024 14:03

younesbelkada reviewed Mar 12, 2024

View reviewed changes

Address comments

f5fc519

Co-Authored-By: Younes Belkada <[email protected]>

pacman100 added 4 commits March 13, 2024 13:18

fix the conditional flag

6973569

Merge branch 'main' into smangrul/fsdp-qlora-support

7cde578

fix conditional flag

c40c767

Merge branch 'main' into smangrul/fsdp-qlora-support

73bda72

younesbelkada approved these changes Mar 13, 2024

View reviewed changes

muellerzr approved these changes Mar 13, 2024

View reviewed changes

src/transformers/training_args.py Outdated Show resolved Hide resolved

tests/fsdp/test_fsdp.py Outdated Show resolved Hide resolved

amyeroberts approved these changes Mar 13, 2024

View reviewed changes

address comments

6f1eb11

Co-Authored-By: Zach Mueller <[email protected]>

pacman100 merged commit 350c5d1 into main Mar 13, 2024
21 checks passed

pacman100 deleted the smangrul/fsdp-qlora-support branch March 13, 2024 16:33

BenjaminBossan mentioned this pull request Aug 19, 2024

Cannot apply both PEFT QLoRA and DeepSpeed ZeRO3 huggingface/peft#2016

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587

pacman100 commented Mar 11, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 11, 2024

younesbelkada left a comment

pacman100 commented Mar 13, 2024

younesbelkada left a comment

muellerzr left a comment

amyeroberts left a comment

amyeroberts Mar 13, 2024

pacman100 Mar 13, 2024

amyeroberts Mar 13, 2024

pacman100 Mar 13, 2024 •

edited

Loading

Titus-von-Koeller commented Mar 14, 2024

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587

Conversation

pacman100 commented Mar 11, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 11, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

pacman100 commented Mar 13, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Mar 13, 2024

Choose a reason for hiding this comment

pacman100 Mar 13, 2024

Choose a reason for hiding this comment

amyeroberts Mar 13, 2024

Choose a reason for hiding this comment

pacman100 Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Titus-von-Koeller commented Mar 14, 2024

pacman100 commented Mar 11, 2024 •

edited

Loading

pacman100 Mar 13, 2024 •

edited

Loading