Full-Finetune worse than Lora-Finetune 全量fientuee效果差于lora #5945

RobinWitch · 2024-11-05T14:48:19Z

RobinWitch
Nov 5, 2024

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.31
Python version: 3.12.6
PyTorch version: 2.4.1+cu121 (GPU)
Transformers version: 4.46.0.dev0
Datasets version: 3.0.1
Accelerate version: 1.0.0
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 3090
Bitsandbytes version: 0.44.1

Reproduction

llamafactory-cli train examples/train_lora/qwen2.5_lora_sft.yaml
llamafactory-cli train examples/train_full/qwen2.5_full_sft.yaml

Expected behavior

lora config 配置如下，全量finetune的配置仅有finetuning_type变为full
lora config is shown as below, the only difference of full-finetune is finetune_type set to full

### model
model_name_or_path: /llm_ckpt/Qwen2.5-0.5B-Instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: identity,alpaca_en_demo
template: qwen
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/qwen2.5-0.5b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

lora结果如下 lora result shown as below
***** train metrics *****
epoch = 2.9817
total_flos = 1051084GF
train_loss = 9.2591
train_runtime = 0:05:19.38
train_samples_per_second = 9.215
train_steps_per_second = 0.573
Figure saved at: saves/qwen2.5-0.5b/lora/sft/training_loss.png
[WARNING|2024-11-05 21:44:41] llamafactory.extras.ploting:162 >> No metric eval_loss to plot.
[WARNING|2024-11-05 21:44:41] llamafactory.extras.ploting:162 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4107] 2024-11-05 21:44:41,170 >>
***** Running Evaluation *****
[INFO|trainer.py:4109] 2024-11-05 21:44:41,170 >> Num examples = 110
[INFO|trainer.py:4112] 2024-11-05 21:44:41,171 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:03<00:00, 16.94it/s]
***** eval metrics *****
epoch = 2.9817
eval_loss = 1.2816
eval_runtime = 0:00:03.31
eval_samples_per_second = 33.231
eval_steps_per_second = 16.615
[INFO|modelcard.py:449] 2024-11-05 21:44:44,486 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

full结果如下 full result shown as below
***** train metrics *****
epoch = 2.9817
total_flos = 1038322GF
train_loss = 7.6519
train_runtime = 0:04:15.82
train_samples_per_second = 11.504
train_steps_per_second = 0.715
Figure saved at: saves/qwen2.5-0.5b/full/sft/training_loss.png
[WARNING|2024-11-05 21:29:36] llamafactory.extras.ploting:162 >> No metric eval_loss to plot.
[WARNING|2024-11-05 21:29:36] llamafactory.extras.ploting:162 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4107] 2024-11-05 21:29:36,793 >>
***** Running Evaluation *****
[INFO|trainer.py:4109] 2024-11-05 21:29:36,793 >> Num examples = 110
[INFO|trainer.py:4112] 2024-11-05 21:29:36,793 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:01<00:00, 30.19it/s]
***** eval metrics *****
epoch = 2.9817
eval_loss = 2.0634
eval_runtime = 0:00:01.85
eval_samples_per_second = 59.216
eval_steps_per_second = 29.608
[INFO|modelcard.py:449] 2024-11-05 21:29:38,706 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

结论 Conclusion
我们可以看到使用lora进行finetune的eval_loss=1.2816要远小于使用full进行finetune的eval_loss=2.0634结果，请问这是为什么呢？我们如何能做到使用full结果好于使用lora的？
We can see that the eval_loss of finetune with LoRA is 1.2816, which is significantly lower than the eval_loss of finetune with the full model, which is 2.0634. What reason is this the case? How can we achieve better results with the full finetune compared to LoRA?

Others

No response

herandy · 2024-11-07T22:18:55Z

herandy
Nov 7, 2024

I have had the same issue. I believe it's the number of training examples, LoRA performs much better when you don't have hundreds of thousands of data points for training, which intuitively makes sense.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full-Finetune worse than Lora-Finetune 全量fientuee效果差于lora #5945

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Full-Finetune worse than Lora-Finetune 全量fientuee效果差于lora #5945

RobinWitch Nov 5, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

Replies: 1 comment

herandy Nov 7, 2024

RobinWitch
Nov 5, 2024

herandy
Nov 7, 2024