[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL by YunzhuLu · Pull Request #32126 · vllm-project/vllm

YunzhuLu · 2026-01-11T19:09:04Z

Purpose

Followup to #28399 and #28730. This PR optimizes the M-RoPE position computation for both Qwen2-VL and Qwen2.5-VL.

Leveraging the pre-calculated mm_features to compute M-RoPE positions.
Simplifying index generation with NumPy-based vectorization.

Test Plan

VLLM_WORKER_MULTIPROC_METHOD=spawn lm_eval --model vllm-vlm --model_args "pretrained=Qwen/Qwen2-VL-7B-Instruct,max_model_len=8192" --tasks chartqa --batch_size 1 --apply_chat_template --seed 42

Test Result

Qwen2-VL test results:

Before:

Tasks	Version	Filter	Metric		Value		Stderr
chartqa	0	none	anywhere_accuracy	↑	0.6860	±	0.0093
		none	exact_match	↑	0.3476	±	0.0095
		none	relaxed_accuracy	↑	0.4112	±	0.0098

After:

Tasks	Version	Filter	Metric		Value		Stderr
chartqa	0	none	anywhere_accuracy	↑	0.6860	±	0.0093
		none	exact_match	↑	0.3476	±	0.0095
		none	relaxed_accuracy	↑	0.4112	±	0.0098

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Optimizes M-RoPE position generation by leveraging pre-parsed multimodal features and vectorized index math.

Adds iter_mm_grid_thw to iterate mm_features by mm_position.offset, yielding (t,h,w) grids and temporal scale (t_factor) for images/videos
Replaces token-scanning logic with offset-based segmentation and NumPy (np.indices, np.broadcast_to) to build 3D position IDs, then converts back to torch
Applies spatial merging (spatial_merge_size) and temporal scaling using second_per_grid_ts * tokens_per_second for video
Minor imports: introduce Iterator, add NumPy; updates both qwen2_vl.py and qwen2_5_vl.py consistently

^{Written by Cursor Bugbot for commit c370ddc7807818b64d98d78873931f675ab66d60. This will update automatically on new commits. Configure here.}

Note

Optimizes and simplifies M-RoPE position computation by leveraging pre-parsed multimodal features and NumPy vectorization.

Adds iter_mm_grid_thw to iterate mm_features by mm_position.offset, yielding (t,h,w) grids and temporal scaling via second_per_grid_ts * tokens_per_second
Rewrites get_mrope_input_positions to segment by offsets and build positions with np.broadcast_to/np.indices, then converts to torch
Applies spatial merging (spatial_merge_size) and temporal scaling (t_factor) uniformly; removes token-id scanning and per-step torch index construction
Minor imports: introduce Iterator and numpy; identical updates in qwen2_vl.py and qwen2_5_vl.py

^{Written by Cursor Bugbot for commit 76c65502e40d29d100c94ed01a6daecb8e76a79c. This will update automatically on new commits. Configure here.}

Note

Optimizes and simplifies M-RoPE position computation for Qwen2-VL and Qwen2.5-VL.

Adds iter_mm_grid_thw to iterate mm_features by mm_position.offset, yielding (t,h,w) and temporal scale (t_factor)
Replaces token scanning in get_mrope_input_positions with offset-based segmentation and NumPy (np.indices, np.broadcast_to), then converts back to torch
Applies spatial merging via spatial_merge_size and temporal scaling using second_per_grid_ts * tokens_per_second for video
Minor imports/typing: introduce Iterator and numpy; updates mirrored in qwen2_vl.py and qwen2_5_vl.py

^{Written by Cursor Bugbot for commit 23cc4d3365298890fc41111c1fac0a3a3bd40e24. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit 95645a5. Configure here.}

Note

Optimizes and simplifies M-RoPE position id generation by leveraging pre-parsed multimodal features and NumPy.

Adds iter_mm_grid_thw to traverse mm_features by mm_position.offset, yielding (t,h,w) grids and temporal scale t_factor
Rewrites get_mrope_input_positions to segment by offsets and build 3D IDs with np.indices/np.broadcast_to, applying spatial merge and second_per_grid_ts * tokens_per_second, then converts back to torch
Minor imports/typing updates (Iterator, numpy); identical changes in qwen2_vl.py and qwen2_5_vl.py

^{Written by Cursor Bugbot for commit b38131b. This will update automatically on new commits. Configure here.}

Note

Optimizes and simplifies M-RoPE position computation using pre-parsed multimodal features and NumPy.

Adds iter_mm_grid_thw to iterate mm_features by mm_position.offset, yielding (t,h,w) and temporal scale t_factor
Rewrites get_mrope_input_positions to segment by offsets and build 3D IDs via np.indices/np.broadcast_to, then converts to torch
Applies spatial merging (spatial_merge_size) and temporal scaling (second_per_grid_ts * tokens_per_second) for video
Minor imports/typing updates (Iterator, numpy); identical changes in qwen2_vl.py and qwen2_5_vl.py

^{Written by Cursor Bugbot for commit b862112. This will update automatically on new commits. Configure here.}

Note

Optimizes and simplifies M-RoPE position id generation by leveraging pre-parsed multimodal features and vectorized index math.

Adds iter_mm_grid_thw to iterate mm_features by mm_position.offset, yielding (t,h,w) grids and temporal scale t_factor
Rewrites get_mrope_input_positions to segment by offsets and build 3D IDs via np.indices/np.broadcast_to, then converts to torch; removes token-id scanning and per-step torch index construction
Applies spatial merging (spatial_merge_size) and temporal scaling using second_per_grid_ts * tokens_per_second for video
Minor updates: import Iterator and numpy; identical changes in qwen2_vl.py and qwen2_5_vl.py

^{Written by Cursor Bugbot for commit b3a0317. This will update automatically on new commits. Configure here.}

github-actions · 2026-01-11T19:09:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request refactors the M-RoPE position computation for Qwen2-VL and Qwen2.5-VL models. The changes significantly simplify the logic by introducing a helper method iter_mm_grid_thw to iterate over multimodal features and by using NumPy for vectorized position calculations. This is a great improvement in terms of both readability and performance. I've found one minor issue with a type hint that should be corrected for static analysis correctness.

vllm/model_executor/models/qwen2_5_vl.py

vllm/model_executor/models/qwen2_vl.py

Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a significant optimization for M-RoPE position computation in Qwen2-VL and Qwen2.5-VL models. By leveraging pre-calculated multimodal features and vectorized NumPy operations, the new implementation is cleaner, more readable, and more efficient than the previous token-scanning approach. The introduction of the iter_mm_grid_thw helper function is a good abstraction. I've identified a potential crash when handling empty inputs, which was also present in the previous implementation, and have provided suggestions to address this edge case.

gemini-code-assist · 2026-01-12T03:47:05Z

vllm/model_executor/models/qwen2_5_vl.py

            )

-        llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
+        llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)


np.concatenate will raise a ValueError if llm_pos_ids_list is empty (e.g., for an empty prompt), which will cause a crash. This should be handled to avoid unexpected failures on valid edge cases.

Suggested change

llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)

llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1) if llm_pos_ids_list else np.empty((3, 0), dtype=np.int64)

gemini-code-assist · 2026-01-12T03:47:05Z

vllm/model_executor/models/qwen2_5_vl.py


-        llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
+        llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)
        mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item()


If llm_positions becomes an empty array after the change in the previous line, calling .max() on it will raise a ValueError. This also needs to be handled to prevent a crash.

Suggested change

mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item()

mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item() if llm_positions.size > 0 else 0

gemini-code-assist · 2026-01-12T03:47:05Z

vllm/model_executor/models/qwen2_vl.py

            )

-        llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
+        llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)


np.concatenate will raise a ValueError if llm_pos_ids_list is empty (e.g., for an empty prompt), which will cause a crash. This should be handled to avoid unexpected failures on valid edge cases.

Suggested change

llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)

llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1) if llm_pos_ids_list else np.empty((3, 0), dtype=np.int64)

gemini-code-assist · 2026-01-12T03:47:05Z

vllm/model_executor/models/qwen2_vl.py


-        llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
+        llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)
        mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item()


If llm_positions becomes an empty array after the change in the previous line, calling .max() on it will raise a ValueError. This also needs to be handled to prevent a crash.

Suggested change

mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item()

mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item() if llm_positions.size > 0 else 0

YunzhuLu · 2026-01-12T11:30:12Z

/CC @DarkLight1337, thanks for taking a look!

vllm/model_executor/models/qwen2_5_vl.py

YunzhuLu · 2026-01-13T08:41:25Z

Hi @Isotr0py,
The CI is stuck on a test_mcp_tools.py failure (due to insufficient GPU memory on the CI node), which is unrelated to M-RoPE changes.
Could you please help bypass this or manually merge? Thanks!

ywang96

Out of curiosity - have you done profiling to quantify the perf gain from this PR (similar to #28730 (comment))? Thanks

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>

YunzhuLu · 2026-01-14T19:30:51Z

@ywang96 Here are the trace results for Qwen2-VL:

Image

Before

### After

Video

Before

### After

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Refactor get_mrope_input_positions() to use mm_feature.mm_position.offset directly instead of searching through input_tokens token-by-token. Changes: - Add iter_mm_features() iterator that yields (offset, modality, data) sorted by mm_position.offset for all 3 modalities (audio, image, video) - Add _get_audio_for_video_mapping() for use_audio_in_video pairing - Add _compute_audio_token_count() and _compute_interleaved_positions() helper methods - Refactor get_mrope_input_positions() to iterate by offset using numpy - Remove unused get_llm_pos_ids_for_vision import Follows the pattern established in PR vllm-project#32126 for Qwen2-VL/2.5-VL. Resolves: vllm-project#32656 (Qwen2.5-Omni item) Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

YunzhuLu requested a review from sighingnow as a code owner January 11, 2026 19:09

mergify bot added the qwen Related to Qwen models label Jan 11, 2026

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

vllm/model_executor/models/qwen2_5_vl.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen2_vl.py Outdated Show resolved Hide resolved

YunzhuLu force-pushed the refactor-mrope-position-computation branch 2 times, most recently from 76c6550 to 23cc4d3 Compare January 11, 2026 19:25

[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL

95645a5

Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>

YunzhuLu force-pushed the refactor-mrope-position-computation branch from 23cc4d3 to 95645a5 Compare January 12, 2026 00:43

Merge branch 'main' into refactor-mrope-position-computation

b38131b

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

DarkLight1337 requested review from Isotr0py and ywang96 January 12, 2026 11:53

Isotr0py reviewed Jan 12, 2026

View reviewed changes

vllm/model_executor/models/qwen2_5_vl.py Show resolved Hide resolved

Isotr0py approved these changes Jan 13, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) January 13, 2026 01:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 13, 2026

Isotr0py disabled auto-merge January 13, 2026 02:57

Merge branch 'main' into refactor-mrope-position-computation

b862112

Isotr0py enabled auto-merge (squash) January 13, 2026 02:57

Merge branch 'main' into refactor-mrope-position-computation

b3a0317

ywang96 reviewed Jan 13, 2026

View reviewed changes

Isotr0py merged commit 542a405 into vllm-project:main Jan 13, 2026
56 checks passed

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (

2dc98a0

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

DarkLight1337 mentioned this pull request Jan 20, 2026

[Tracker]: Use mm_features for M-RoPE calculation for all models #32656

Open

10 tasks

Etelis mentioned this pull request Jan 21, 2026

[Model] Use mm_position to compute mrope positions for Qwen2.5-Omni #32772

Merged

2 tasks

Etelis mentioned this pull request Jan 24, 2026

[Model] Use mm_position to compute mrope positions for Qwen3-Omni #33010

Merged

6 tasks

YunzhuLu deleted the refactor-mrope-position-computation branch January 25, 2026 08:01

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (

01cd0c4

vllm-project#32126) Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

	llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)
	llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1) if llm_pos_ids_list else np.empty((3, 0), dtype=np.int64)

	mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item()
	mrope_position_delta = (llm_positions.max() + 1 - len(input_tokens)).item() if llm_positions.size > 0 else 0

Uh oh!

Conversation

YunzhuLu commented Jan 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Before:

After:

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

YunzhuLu commented Jan 12, 2026

Uh oh!

Uh oh!

YunzhuLu commented Jan 13, 2026

Uh oh!

ywang96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YunzhuLu commented Jan 14, 2026

Image

Before

Video

Before

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YunzhuLu commented Jan 11, 2026 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited

Loading