Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.1版本效果问题 (相比2.X下降) #3092

Open
wtl0207 opened this issue Feb 13, 2025 · 5 comments
Open

3.1版本效果问题 (相比2.X下降) #3092

wtl0207 opened this issue Feb 13, 2025 · 5 comments

Comments

@wtl0207
Copy link

wtl0207 commented Feb 13, 2025

同样的训练数据,同样的预训练模型,3.1效果比2.X效果差了10个点。下面两个分别是3.1和2.x的训练命令。
SIZE_FACTOR=28
MAX_PIXELS=1003520
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
swift sft
--model Qwen/Qwen2-VL-2B-Instruct
--train_type lora
--lora_rank 64
--lora_alpha 256
--target_modules all-linear
--init_weights pissa
--use_rslora True
--freeze_llm False
--freeze_vit False
--freeze_aligner False
--dataset data/28.jsonl
--deepspeed zero2
--output_dir output/qwen-2-vl-2b
--ddp_timeout 86400
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--num_train_epochs 1
--learning_rate 1e-5
--lr_scheduler_type cosine
--eval_steps 2000
--save_steps 2000
--dataloader_num_workers 4
--save_total_limit -1
--logging_steps 20
--max_length 32768

SIZE_FACTOR=28
MAX_PIXELS=1003520
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
swift sft
--model_type qwen2-vl-2b-instruct
--model_id_or_path Qwen/Qwen2-VL-2B-Instruct
--sft_type lora
--lora_rank 64
--lora_alpha 256
--target_modules ALL
--init_lora_weights pissa
--use_rslora True
--freeze_vit False
--dataset data/28.jsonl
--deepspeed default-zero2
--output_dir output/old-qwen-2-vl-2b
--add_output_dir_suffix False
--ddp_timeout 86400
--batch_size 1
--num_train_epochs 1
--learning_rate 1e-5
--lr_scheduler_type cosine
--eval_steps 2000
--save_steps 2000
--dataloader_num_workers 1
--save_total_limit -1
--logging_steps 20
--max_length 32768

@Jintao-Huang
Copy link
Collaborator

梯度累加设置一下

@wtl0207
Copy link
Author

wtl0207 commented Feb 13, 2025

梯度累加设置一下

您好,2.6没有设置梯度累计,效果为什么比3.1好10个点?是因为2.6默认设置了吗

@Jintao-Huang
Copy link
Collaborator

是的 2.6默认设置了

@Jintao-Huang Jintao-Huang changed the title 3.1版本效果问题 3.1版本效果问题 (相比2.X下降) Feb 13, 2025
@Jintao-Huang
Copy link
Collaborator

--warmup_ratio

参考一下examples/train/multimodal中的例子

@wtl0207
Copy link
Author

wtl0207 commented Feb 13, 2025

--warmup_ratio

参考一下examples/train/multimodal中的例子

好的好的,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants