diff --git a/examples/language-modeling/README.md b/examples/language-modeling/README.md index abf19c457b..776993aca1 100644 --- a/examples/language-modeling/README.md +++ b/examples/language-modeling/README.md @@ -562,41 +562,41 @@ python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \ - Multi-card finetuning of Llama2-70B with FSDP and LoRA: ```bash -PT_HPU_MAX_COMPOUND_OP_SIZE=10 DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 \ -python3 ../gaudi_spawn.py --use_mpi --world_size 8 run_lora_clm.py \ +LOWER_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=0 \ +python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_lora_clm.py \ --model_name_or_path meta-llama/Llama-2-70b-hf \ --dataset_name tatsu-lab/alpaca \ --bf16 True \ --output_dir ./lora_out \ - --num_train_epochs 2 \ --max_seq_len 2048 \ - --per_device_train_batch_size 10 \ - --per_device_eval_batch_size 10 \ --gradient_checkpointing \ - --evaluation_strategy epoch \ - --eval_delay 2 \ + --per_device_train_batch_size 5 \ --save_strategy no \ --learning_rate 0.0004 \ --warmup_ratio 0.03 \ --lr_scheduler_type "constant" \ --logging_steps 1 \ --dataset_concatenation \ - --attn_softmax_bf16 True \ --do_train \ - --do_eval \ --use_habana \ - --use_lazy_mode False \ - --pipelining_fwd_bwd False \ --throughput_warmup_steps 3 \ --lora_rank 4 \ --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" \ + --attn_softmax_bf16 True \ --validation_split_percentage 4 \ - --use_flash_attention True \ + --use_lazy_mode False \ --fsdp_config fsdp_config.json \ - --fsdp "auto_wrap" \ - --torch_compile_backend hpu_backend \. + --fsdp auto_wrap \ + --num_train_epochs 2 \ + --evaluation_strategy epoch \ + --per_device_eval_batch_size 1 \ + --eval_delay 2 \ + --do_eval \ + --pipelining_fwd_bwd False \ + --use_fused_rope False \ + --torch_compile_backend hpu_backend \ --torch_compile \ - --use_fused_rope False + --gradient_accumulation_steps 2 ``` - Multi-card finetuning of Falcon-180B: