[model] fix: qwen2vl for transformers 4.52.*#3524
[model] fix: qwen2vl for transformers 4.52.*#3524vermouth1992 merged 1 commit intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the monkey-patching logic for qwen2_vl models to reduce code duplication. While the intention is good, the implementation introduces a critical bug by attempting to import modules that may not exist in older versions of the transformers library, which will cause an ImportError. I've provided a detailed comment on how to fix this. The other change in qwen2_vl.py is a nice refactoring that improves code clarity.
Data base funds |
What does this PR do?
Follow #3496
Sry I pushed a wrong patch that breaks the qwen2vl RL training with transformers 4.52.*
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Before
worker-3935d9cfad16aa45692b9c68abf4f4cf8db16309a17fde92755700ce-01000000-15785.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:3.472848415374756 - actor/kl_loss:0.13446078318838772 - actor/kl_coef:0.01 - actor/pg_loss:0.0366882234957302 - actor/pg_clipfrac:0.14771737871706137 - actor/ppo_kl:-0.7627949655689008 - actor/pg_clipfrac_lower:0.1256448536823882 - actor/grad_norm:2.717234194278717 - perf/mfu/actor:0.16486244469331027 - perf/max_memory_allocated_gb:65.16531085968018 - perf/max_memory_reserved_gb:77.400390625 - perf/cpu_memory_used_gb:107.61865234375 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0005063731223344803 - timing_s/generate_sequences:38.09121322631836 - timing_s/generation_timing/max:39.53853225708008 - timing_s/generation_timing/min:36.43351364135742 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:54.0543412566185 - timing_s/reward:2.6713286973536015 - timing_s/old_log_prob:23.379646060988307 - timing_s/ref:28.250701885670424 - timing_s/adv:0.05767686199396849 - timing_s/update_actor:65.79985951259732 - timing_s/step:174.34661124274135 - timing_s/stop_profile:8.783861994743347e-05 - timing_per_token_ms/gen:0.05816078336618822 - timing_per_token_ms/update_actor:0.04162624832362093 - timing_per_token_ms/adv:3.6487484892403185e-05 - timing_per_token_ms/ref:0.017871933781019166 - perf/total_num_tokens:1580730 - perf/time_per_step:174.34661124274135 - perf/throughput:1133.3242934380603
After
worker-451bef923f49d31f865d6a79abab0409c0c3f2f0791667f055422fc9-01000000-662223.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:0.27104082703590393 - actor/kl_loss:5.835916465457558e-05 - actor/kl_coef:0.01 - actor/pg_loss:0.011557773974345764 - actor/pg_clipfrac:0.0017317058172920952 - actor/ppo_kl:0.00029370379894544385 - actor/pg_clipfrac_lower:0.0 - actor/grad_norm:0.4044433757662773 - perf/mfu/actor:0.17756381519466707 - perf/max_memory_allocated_gb:65.1653242111206 - perf/max_memory_reserved_gb:77.287109375 - perf/cpu_memory_used_gb:109.29206085205078 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0007536560297012329 - timing_s/generate_sequences:36.63264465332031 - timing_s/generation_timing/max:37.78916931152344 - timing_s/generation_timing/min:35.138607025146484 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:50.77131261955947 - timing_s/reward:3.0939974496141076 - timing_s/old_log_prob:20.572295984253287 - timing_s/ref:26.40171806514263 - timing_s/adv:0.06200561858713627 - timing_s/update_actor:61.85246595554054 - timing_s/step:162.88494281563908 - timing_s/stop_profile:5.281250923871994e-05 - timing_per_token_ms/update_actor:0.03912905173909557 - timing_per_token_ms/gen:0.05462834706401419 - timing_per_token_ms/ref:0.01670223128879861 - timing_per_token_ms/adv:3.922593902003269e-05 - perf/total_num_tokens:1580730 - perf/time_per_step:162.88494281563908 - perf/throughput:1213.0725319629032
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)