-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646
Comments
Hi @Andy666G can you provide a script that reproduces this error? |
Sure, here is a script @jomayeri # export CUDA_VISIBLE_DEVICES=0
lora_rank=8
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
#modules_to_save="embed_tokens,lm_head"
# modules_to_save="lm_head"
modules_to_save=""
lora_dropout=0.1
pretrained_model="models/vicuna-13b-all-v1.1/"
chinese_tokenizer_path="models/vicuna-13b-all-v1.1/tokenizer.model"
dataset_dir="/doc"
data_cache="$PWD/cache"
per_device_batch_size=1 # 1024 ,from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
training_steps=7000 # 6000, from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
lr=2.34e-06 # from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
gradient_accumulation_steps=1
output_dir="output"
max_train_samples=${per_device_batch_size}
max_eval_samples=${per_device_batch_size}
#TODO: deepspeed
deepspeed --include localhost:0 --master_port 12688 scripts/run_clm_pt_with_peft.py \
--model_name_or_path ${pretrained_model} \
--tokenizer_name_or_path ${chinese_tokenizer_path} \
--dataset_dir ${dataset_dir} \
--data_cache_dir $data_cache \
--validation_split_percentage 0.001 \
--per_device_train_batch_size ${per_device_batch_size} \
--per_device_eval_batch_size ${per_device_batch_size} \
--do_train \
--debug_mode \
--torch_dtype float16 \
--seed $RANDOM \
--max_steps ${training_steps} \
--lr_scheduler_type cosine \
--learning_rate ${lr} \
--warmup_ratio 0.05 \
--weight_decay 0.01 \
--logging_strategy steps \
--logging_steps 10 \
--save_strategy steps \
--save_total_limit 3 \
--save_steps 1000 \
--gradient_accumulation_steps ${gradient_accumulation_steps} \
--preprocessing_num_workers 8 \
--block_size 512 \
--output_dir ${output_dir} \
--ddp_timeout 30000 \
--logging_first_step True \
--lora_rank ${lora_rank} \
--trainable ${lora_trainable} \
--lora_dropout ${lora_dropout} \
--deepspeed deepspeed_config.json \
--fp16 \
--overwrite_output_dir \
|
I had the same problem and was very confused |
@Andy666G Sorry I cannot repro this issue. In the description you say "when calling post_forward_hook" is this a hook you added? Have you raised the issue with Pytorch? |
Closing for now, please reopen if needed. |
Describe the bug
When tranining llama 13B(https://github.com/ymcui/Chinese-LLaMA-Alpaca), I observed it cannot free parameter memory using ZeRO3 + Offload strategy parameter in pytorch1.9, but parameter memory can be freed in pytorch1.13 with the same deepspeed strategy. This issue(#3002) cannot solve this bug.
To Reproduce
deepspeed0.9.2 + pytorch1.9 + peft 0.3 + transformers4.28.1
ds_config
Expected behavior
A clear and concise description of what you expected to happen.
Pytorch 1.13, deepspeed 0.9.2
Pytorch 1.9, deepspeed 0.9.2
ds_report output
Please run
ds_report
to give us details about your setup.Screenshots
when calling post_forward_hook, parameters can be freed in pytorch1.13, but cannot be freed in pytorch1.9
Pytorch1.13
Pytorch1.9
System info (please complete the following information):
Launcher context
Are you launching your experiment with the
deepspeed
launcher, MPI, or something else?deepspeed launcher
Docker context
Are you using a specific docker image that you can share?
NGC22.07(Pytorch1.13) and NGC21.06(Pytorch1.9)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: