Skip to content

[model] fix: qwen2vl for transformers 4.52.*#3524

Merged
vermouth1992 merged 1 commit intoverl-project:mainfrom
hiyouga:fix_patch_2
Sep 18, 2025
Merged

[model] fix: qwen2vl for transformers 4.52.*#3524
vermouth1992 merged 1 commit intoverl-project:mainfrom
hiyouga:fix_patch_2

Conversation

@hiyouga
Copy link
Collaborator

@hiyouga hiyouga commented Sep 18, 2025

What does this PR do?

Follow #3496

Sry I pushed a wrong patch that breaks the qwen2vl RL training with transformers 4.52.*

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Before

actor/kl_loss:0.13446078318838772

worker-3935d9cfad16aa45692b9c68abf4f4cf8db16309a17fde92755700ce-01000000-15785.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:3.472848415374756 - actor/kl_loss:0.13446078318838772 - actor/kl_coef:0.01 - actor/pg_loss:0.0366882234957302 - actor/pg_clipfrac:0.14771737871706137 - actor/ppo_kl:-0.7627949655689008 - actor/pg_clipfrac_lower:0.1256448536823882 - actor/grad_norm:2.717234194278717 - perf/mfu/actor:0.16486244469331027 - perf/max_memory_allocated_gb:65.16531085968018 - perf/max_memory_reserved_gb:77.400390625 - perf/cpu_memory_used_gb:107.61865234375 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0005063731223344803 - timing_s/generate_sequences:38.09121322631836 - timing_s/generation_timing/max:39.53853225708008 - timing_s/generation_timing/min:36.43351364135742 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:54.0543412566185 - timing_s/reward:2.6713286973536015 - timing_s/old_log_prob:23.379646060988307 - timing_s/ref:28.250701885670424 - timing_s/adv:0.05767686199396849 - timing_s/update_actor:65.79985951259732 - timing_s/step:174.34661124274135 - timing_s/stop_profile:8.783861994743347e-05 - timing_per_token_ms/gen:0.05816078336618822 - timing_per_token_ms/update_actor:0.04162624832362093 - timing_per_token_ms/adv:3.6487484892403185e-05 - timing_per_token_ms/ref:0.017871933781019166 - perf/total_num_tokens:1580730 - perf/time_per_step:174.34661124274135 - perf/throughput:1133.3242934380603

After

actor/kl_loss:5.835916465457558e-05

worker-451bef923f49d31f865d6a79abab0409c0c3f2f0791667f055422fc9-01000000-662223.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:0.27104082703590393 - actor/kl_loss:5.835916465457558e-05 - actor/kl_coef:0.01 - actor/pg_loss:0.011557773974345764 - actor/pg_clipfrac:0.0017317058172920952 - actor/ppo_kl:0.00029370379894544385 - actor/pg_clipfrac_lower:0.0 - actor/grad_norm:0.4044433757662773 - perf/mfu/actor:0.17756381519466707 - perf/max_memory_allocated_gb:65.1653242111206 - perf/max_memory_reserved_gb:77.287109375 - perf/cpu_memory_used_gb:109.29206085205078 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0007536560297012329 - timing_s/generate_sequences:36.63264465332031 - timing_s/generation_timing/max:37.78916931152344 - timing_s/generation_timing/min:35.138607025146484 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:50.77131261955947 - timing_s/reward:3.0939974496141076 - timing_s/old_log_prob:20.572295984253287 - timing_s/ref:26.40171806514263 - timing_s/adv:0.06200561858713627 - timing_s/update_actor:61.85246595554054 - timing_s/step:162.88494281563908 - timing_s/stop_profile:5.281250923871994e-05 - timing_per_token_ms/update_actor:0.03912905173909557 - timing_per_token_ms/gen:0.05462834706401419 - timing_per_token_ms/ref:0.01670223128879861 - timing_per_token_ms/adv:3.922593902003269e-05 - perf/total_num_tokens:1580730 - perf/time_per_step:162.88494281563908 - perf/throughput:1213.0725319629032

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the monkey-patching logic for qwen2_vl models to reduce code duplication. While the intention is good, the implementation introduces a critical bug by attempting to import modules that may not exist in older versions of the transformers library, which will cause an ImportError. I've provided a detailed comment on how to fix this. The other change in qwen2_vl.py is a nice refactoring that improves code clarity.

@vermouth1992 vermouth1992 merged commit c0e2b9d into verl-project:main Sep 18, 2025
57 of 61 checks passed
@drgabriel11
Copy link

What does this PR do?

Follow #3496

Sry I pushed a wrong patch that breaks the qwen2vl RL training with transformers 4.52.*

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...

  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)

    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data

    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]

    • {type} is in feat, fix, refactor, chore, test

    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.

    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Before

actor/kl_loss:0.13446078318838772

worker-3935d9cfad16aa45692b9c68abf4f4cf8db16309a17fde92755700ce-01000000-15785.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:3.472848415374756 - actor/kl_loss:0.13446078318838772 - actor/kl_coef:0.01 - actor/pg_loss:0.0366882234957302 - actor/pg_clipfrac:0.14771737871706137 - actor/ppo_kl:-0.7627949655689008 - actor/pg_clipfrac_lower:0.1256448536823882 - actor/grad_norm:2.717234194278717 - perf/mfu/actor:0.16486244469331027 - perf/max_memory_allocated_gb:65.16531085968018 - perf/max_memory_reserved_gb:77.400390625 - perf/cpu_memory_used_gb:107.61865234375 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0005063731223344803 - timing_s/generate_sequences:38.09121322631836 - timing_s/generation_timing/max:39.53853225708008 - timing_s/generation_timing/min:36.43351364135742 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:54.0543412566185 - timing_s/reward:2.6713286973536015 - timing_s/old_log_prob:23.379646060988307 - timing_s/ref:28.250701885670424 - timing_s/adv:0.05767686199396849 - timing_s/update_actor:65.79985951259732 - timing_s/step:174.34661124274135 - timing_s/stop_profile:8.783861994743347e-05 - timing_per_token_ms/gen:0.05816078336618822 - timing_per_token_ms/update_actor:0.04162624832362093 - timing_per_token_ms/adv:3.6487484892403185e-05 - timing_per_token_ms/ref:0.017871933781019166 - perf/total_num_tokens:1580730 - perf/time_per_step:174.34661124274135 - perf/throughput:1133.3242934380603

After

actor/kl_loss:5.835916465457558e-05

worker-451bef923f49d31f865d6a79abab0409c0c3f2f0791667f055422fc9-01000000-662223.out|TaskRunner|step:1 - global_seqlen/min:186109 - global_seqlen/max:208082 - global_seqlen/minmax_diff:21973 - global_seqlen/balanced_min:197591 - global_seqlen/balanced_max:197592 - global_seqlen/mean:197591.25 - actor/entropy:0.27104082703590393 - actor/kl_loss:5.835916465457558e-05 - actor/kl_coef:0.01 - actor/pg_loss:0.011557773974345764 - actor/pg_clipfrac:0.0017317058172920952 - actor/ppo_kl:0.00029370379894544385 - actor/pg_clipfrac_lower:0.0 - actor/grad_norm:0.4044433757662773 - perf/mfu/actor:0.17756381519466707 - perf/max_memory_allocated_gb:65.1653242111206 - perf/max_memory_reserved_gb:77.287109375 - perf/cpu_memory_used_gb:109.29206085205078 - actor/lr:1e-06 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.32820311188697815 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.32820311188697815 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04529883340001106 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888498306274414 - critic/returns/mean:-0.04529883340001106 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888498306274414 - response_length/mean:363.044921875 - response_length/max:1223.0 - response_length/min:48.0 - response_length/clip_ratio:0.0 - response_length_non_aborted/mean:363.044921875 - response_length_non_aborted/max:1223.0 - response_length_non_aborted/min:48.0 - response_length_non_aborted/clip_ratio:0.0 - response/aborted_ratio:0.0 - prompt_length/mean:254.427734375 - prompt_length/max:996.0 - prompt_length/min:102.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:0.0007536560297012329 - timing_s/generate_sequences:36.63264465332031 - timing_s/generation_timing/max:37.78916931152344 - timing_s/generation_timing/min:35.138607025146484 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:50.77131261955947 - timing_s/reward:3.0939974496141076 - timing_s/old_log_prob:20.572295984253287 - timing_s/ref:26.40171806514263 - timing_s/adv:0.06200561858713627 - timing_s/update_actor:61.85246595554054 - timing_s/step:162.88494281563908 - timing_s/stop_profile:5.281250923871994e-05 - timing_per_token_ms/update_actor:0.03912905173909557 - timing_per_token_ms/gen:0.05462834706401419 - timing_per_token_ms/ref:0.01670223128879861 - timing_per_token_ms/adv:3.922593902003269e-05 - perf/total_num_tokens:1580730 - perf/time_per_step:162.88494281563908 - perf/throughput:1213.0725319629032

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

[!IMPORTANT]

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Data base funds

@hiyouga hiyouga deleted the fix_patch_2 branch October 6, 2025 13:05
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants