Full-Finetune worse than Lora-Finetune 全量fientuee效果差于lora #5945
Unanswered
RobinWitch
asked this question in
Q&A
Replies: 1 comment
-
I have had the same issue. I believe it's the number of training examples, LoRA performs much better when you don't have hundreds of thousands of data points for training, which intuitively makes sense. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
llamafactory-cli train examples/train_lora/qwen2.5_lora_sft.yaml
llamafactory-cli train examples/train_full/qwen2.5_full_sft.yaml
Expected behavior
lora config 配置如下,全量finetune的配置仅有finetuning_type变为full
lora config is shown as below, the only difference of full-finetune is
finetune_type
set tofull
lora结果如下 lora result shown as below
***** train metrics *****
epoch = 2.9817
total_flos = 1051084GF
train_loss = 9.2591
train_runtime = 0:05:19.38
train_samples_per_second = 9.215
train_steps_per_second = 0.573
Figure saved at: saves/qwen2.5-0.5b/lora/sft/training_loss.png
[WARNING|2024-11-05 21:44:41] llamafactory.extras.ploting:162 >> No metric eval_loss to plot.
[WARNING|2024-11-05 21:44:41] llamafactory.extras.ploting:162 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4107] 2024-11-05 21:44:41,170 >>
***** Running Evaluation *****
[INFO|trainer.py:4109] 2024-11-05 21:44:41,170 >> Num examples = 110
[INFO|trainer.py:4112] 2024-11-05 21:44:41,171 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:03<00:00, 16.94it/s]
***** eval metrics *****
epoch = 2.9817
eval_loss = 1.2816
eval_runtime = 0:00:03.31
eval_samples_per_second = 33.231
eval_steps_per_second = 16.615
[INFO|modelcard.py:449] 2024-11-05 21:44:44,486 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
full结果如下 full result shown as below
***** train metrics *****
epoch = 2.9817
total_flos = 1038322GF
train_loss = 7.6519
train_runtime = 0:04:15.82
train_samples_per_second = 11.504
train_steps_per_second = 0.715
Figure saved at: saves/qwen2.5-0.5b/full/sft/training_loss.png
[WARNING|2024-11-05 21:29:36] llamafactory.extras.ploting:162 >> No metric eval_loss to plot.
[WARNING|2024-11-05 21:29:36] llamafactory.extras.ploting:162 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4107] 2024-11-05 21:29:36,793 >>
***** Running Evaluation *****
[INFO|trainer.py:4109] 2024-11-05 21:29:36,793 >> Num examples = 110
[INFO|trainer.py:4112] 2024-11-05 21:29:36,793 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:01<00:00, 30.19it/s]
***** eval metrics *****
epoch = 2.9817
eval_loss = 2.0634
eval_runtime = 0:00:01.85
eval_samples_per_second = 59.216
eval_steps_per_second = 29.608
[INFO|modelcard.py:449] 2024-11-05 21:29:38,706 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
结论 Conclusion
我们可以看到使用lora进行finetune的eval_loss=1.2816要远小于使用full进行finetune的eval_loss=2.0634结果,请问这是为什么呢?我们如何能做到使用full结果好于使用lora的?
We can see that the eval_loss of finetune with LoRA is 1.2816, which is significantly lower than the eval_loss of finetune with the full model, which is 2.0634. What reason is this the case? How can we achieve better results with the full finetune compared to LoRA?
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions