-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot apply both PEFT QLoRA and DeepSpeed ZeRO3 #2016
Comments
@echo-yi Does it work for you with a smaller model, like the example from the PEFT docs? @matthewdouglas @Titus-von-Koeller Could you please take a look, could it be an issue specifically with Llama 405B? More context: huggingface/transformers#29587
I wonder if this is still needed? |
@BenjaminBossan I tried with |
Thanks for testing those. Since this error occurs already at the stage of loading the base model, it is not directly a PEFT error, though of course PEFT is affected and I'd be ready to update the docs if it is confirmed that DS ZeRO3 doesn't work with bnb. I hope the bnb authors can elucidate us. |
Yeah, that was added in the PR I mentioned earlier. I can confirm that even for smaller models, partitioning does not appear to work. But when I remove quantization and use |
@BenjaminBossan When I remove ZeRO3 and use quantization & |
Also pinging @muellerzr in case he knows something about this. |
I tested in axolotl against latest transformers, and this seems to work with this qlora+peft+zero3 yaml
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
System Info
Who can help?
@stevhliu
Information
Tasks
examples
folderReproduction
This line
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-405B-Instruct", ...)
throws CUDA OOM, because the parameters are not partitoned, but copied across the GPUs.command
accelerate launch --config_file zero3_config.yaml pretrain.py --num_processes=8 --multi_gpu
pretrain.py
zero3_config.yaml
Expected behavior
PEFT QLoRA (with BitsandBytes) and DeepSpeed ZeRO3 are both applied, so that model parameters are quantized and partitoned.
I thought this should be working according to this post, but microsoft/DeepSpeed#5819 says BitsandBytes quantization and ZeRO3 are not compatible. If this is the case, I find the above post quite misleading.
The text was updated successfully, but these errors were encountered: