-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.1版本效果问题 (相比2.X下降) #3092
Comments
梯度累加设置一下 |
您好,2.6没有设置梯度累计,效果为什么比3.1好10个点?是因为2.6默认设置了吗 |
是的 2.6默认设置了 |
--warmup_ratio 参考一下examples/train/multimodal中的例子 |
好的好的,谢谢 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
同样的训练数据,同样的预训练模型,3.1效果比2.X效果差了10个点。下面两个分别是3.1和2.x的训练命令。
SIZE_FACTOR=28
MAX_PIXELS=1003520
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
swift sft
--model Qwen/Qwen2-VL-2B-Instruct
--train_type lora
--lora_rank 64
--lora_alpha 256
--target_modules all-linear
--init_weights pissa
--use_rslora True
--freeze_llm False
--freeze_vit False
--freeze_aligner False
--dataset data/28.jsonl
--deepspeed zero2
--output_dir output/qwen-2-vl-2b
--ddp_timeout 86400
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--num_train_epochs 1
--learning_rate 1e-5
--lr_scheduler_type cosine
--eval_steps 2000
--save_steps 2000
--dataloader_num_workers 4
--save_total_limit -1
--logging_steps 20
--max_length 32768
SIZE_FACTOR=28
MAX_PIXELS=1003520
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
swift sft
--model_type qwen2-vl-2b-instruct
--model_id_or_path Qwen/Qwen2-VL-2B-Instruct
--sft_type lora
--lora_rank 64
--lora_alpha 256
--target_modules ALL
--init_lora_weights pissa
--use_rslora True
--freeze_vit False
--dataset data/28.jsonl
--deepspeed default-zero2
--output_dir output/old-qwen-2-vl-2b
--add_output_dir_suffix False
--ddp_timeout 86400
--batch_size 1
--num_train_epochs 1
--learning_rate 1e-5
--lr_scheduler_type cosine
--eval_steps 2000
--save_steps 2000
--dataloader_num_workers 1
--save_total_limit -1
--logging_steps 20
--max_length 32768
The text was updated successfully, but these errors were encountered: