[Model] Use mm_position to compute mrope positions for Qwen3-Omni by Etelis · Pull Request #33010 · vllm-project/vllm

Etelis · 2026-01-24T19:39:55Z

Optimizes M-RoPE position calculation for Qwen3-Omni by using mm_position.offset directly from MultiModalFeatureSpec instead of token-by-token searching through input_tokens.

Part of #32656 (M-RoPE calculation optimization tracker). Follows the pattern from Qwen2.5-Omni optimization.

Preserves Qwen3-Omni specifics: audio token formula, BOS/EOS handling, token-by-token interleaving for use_audio_in_video.

Test Plan

Syntax check passes
pytest tests/models/multimodal/processing/test_qwen3_omni.py -v
pytest tests/model_executor/test_qwen3_omni.py -v
Verify single image/video/audio inputs
Verify mixed modalities
Verify use_audio_in_video=True

Optimizes M-RoPE position calculation by using mm_position.offset directly from MultiModalFeatureSpec instead of token-by-token searching. Changes: - Add helper methods for modular position computation - Refactor get_mrope_input_positions to use mm_position.offset - Replace torch operations with numpy for efficiency - Remove unused get_llm_pos_ids_for_vision import Part of vllm-project#32656 Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

Etelis · 2026-01-24T19:40:32Z

@DarkLight1337 cc

gemini-code-assist

Code Review

This pull request refactors the M-RoPE position calculation for Qwen3-Omni to be more efficient by using mm_position.offset from MultiModalFeatureSpec instead of token-by-token searching. The changes look good and follow the pattern from a similar optimization for Qwen2.5-Omni. I've found one critical issue that could lead to a crash under specific circumstances. Please see my comment below.

gemini-code-assist · 2026-01-24T19:42:24Z

vllm/model_executor/models/qwen3_omni_moe_thinker.py

+                    llm_pos_ids_list.append(pos_ids)
+                    st_idx = int(pos_ids.max()) + 1
+
+                    eos_pos = np.broadcast_to(np.array([st_idx]), (3, 1))
+                    llm_pos_ids_list.append(eos_pos)
+                    llm_pos_ids_list.append(eos_pos)
+
+                    video_len = grid_t * grid_h * grid_w
+                    audio_len = self._compute_audio_token_count(
+                        data["audio_feature_length"]
                    )
-                video_len = video_grid_thw[video_idx].prod() // (spatial_merge_size**2)
-                st_idx = llm_pos_ids_list[-1].max() + 1 if llm_pos_ids_list else 0
-                eos_len = 1
-                eos_block = (
-                    torch.arange(eos_len, dtype=torch.long).view(1, -1).expand(3, -1)
-                    + st_idx
-                )
-                llm_pos_ids_list.append(eos_block)
-                llm_pos_ids_list.append(eos_block)
-                st += text_len + bos_len * 2 + audio_len + video_len + eos_len * 2  # noqa: E501
-                audio_idx += 1
-                video_idx += 1
-                remain_videos -= 1
-                remain_audios -= 1
+                    st = offset + 2 + video_len + audio_len + 2

-        if st < len(input_tokens):
-            st_idx = llm_pos_ids_list[-1].max() + 1 if llm_pos_ids_list else 0
-            text_len = len(input_tokens) - st
+        if st < seq_len:
+            st_idx = int(llm_pos_ids_list[-1].max()) + 1 if llm_pos_ids_list else 0
+            text_len = seq_len - st
            llm_pos_ids_list.append(
-                torch.arange(text_len, dtype=torch.long).view(1, -1).expand(3, -1)
-                + st_idx
+                np.broadcast_to(np.arange(text_len), (3, text_len)) + st_idx
            )

-        llm_positions = torch.cat(llm_pos_ids_list, dim=1).reshape(3, -1)
+        llm_positions = np.concatenate(llm_pos_ids_list, axis=1).reshape(3, -1)
        if llm_positions.shape[1] != seq_len:
            raise RuntimeError("Position ids length mismatch with input ids length")

-        mrope_position_delta = llm_positions.max() + 1 - seq_len
-        return llm_positions, mrope_position_delta
+        mrope_position_delta = int(llm_positions.max()) + 1 - seq_len
+        return torch.from_numpy(llm_positions), mrope_position_delta

    def get_mm_mapping(self) -> MultiModelKeys:
        """


There's a potential TypeError here. If data["use_audio_in_video"] is True but there is no corresponding audio feature, data["audio_feature_length"] will be None. This will cause _compute_interleaved_positions to be called with None, which then raises a TypeError when torch.tensor([None]) is attempted in _compute_audio_token_count.

To fix this, you should handle this case by falling back to video-only processing. You can adjust the condition to also check if audio_feature_length is available.

if not data["use_audio_in_video"] or data["audio_feature_length"] is None: grid_indices = np.indices((grid_t, grid_h, grid_w)) if t_factor != 1.0: grid_indices[0] = (grid_indices[0] * t_factor).astype(np.int64) llm_pos_ids_list.append(grid_indices.reshape(3, -1) + st_idx) video_len = grid_t * grid_h * grid_w st_idx = int(llm_pos_ids_list[-1].max()) + 1 eos_pos = np.broadcast_to(np.array([st_idx]), (3, 1)) llm_pos_ids_list.append(eos_pos) st = offset + 1 + video_len + 1 else: audio_bos_pos = np.broadcast_to(np.array([st_idx - 1]), (3, 1)) llm_pos_ids_list.append(audio_bos_pos) pos_ids, _ = self._compute_interleaved_positions(st_idx, data) llm_pos_ids_list.append(pos_ids) st_idx = int(pos_ids.max()) + 1 eos_pos = np.broadcast_to(np.array([st_idx]), (3, 1)) llm_pos_ids_list.append(eos_pos) llm_pos_ids_list.append(eos_pos) video_len = grid_t * grid_h * grid_w audio_len = self._compute_audio_token_count( data["audio_feature_length"] ) st = offset + 2 + video_len + audio_len + 2

Etelis · 2026-01-25T13:18:21Z

@DarkLight1337 Give me GL to run benchmarking :) and I'll

DarkLight1337 · 2026-01-25T13:21:40Z

Yeah feel free to do it

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

mergify · 2026-01-25T14:21:49Z

Documentation preview: https://vllm--33010.org.readthedocs.build/en/33010/

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

Etelis · 2026-01-25T14:43:30Z

Yeah feel free to do it

Test	Query Type	Result
`multi_images`	2 images	Correctly compared Tokyo Skytree + cherry blossoms vs Chinatown scene
`mixed_modalities`	audio + image + video	Correctly identified "Mary Had a Little Lamb", described cherry blossom image, analyzed baby video
`multi_audios`	2 audio clips	Correctly distinguished sports broadcast from nursery rhyme
`use_audio_in_video`	video + audio interleaved	Correctly described baby video and transcribed audio

test_qwen3_multi_audios.log
test_qwen3_multi_images.log
test_qwen3_mixed.log
test_qwen3_audio_in_video.log

Ran on

Sun Jan 25 14:46:59 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 6000 Blac...    On  |   00000000:01:00.0 Off |                    0 |
| N/A   27C    P8             30W /  600W |       0MiB /  97887MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

All read @DarkLight1337

DarkLight1337

@Isotr0py can you review this as well?

Isotr0py

LGTM

Etelis · 2026-01-26T13:47:13Z

@DarkLight1337 Re-run?

DarkLight1337 · 2026-01-26T13:48:15Z

Merged

…lm-project#33010) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>

Etelis requested a review from sighingnow as a code owner January 24, 2026 19:39

Merge branch 'main' into etelis/qwen3-omni-mrope-optimization

b8d67be

mergify bot added the qwen Related to Qwen models label Jan 24, 2026

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

Merge branch 'main' into etelis/qwen3-omni-mrope-optimization

f8fa078

DarkLight1337 mentioned this pull request Jan 25, 2026

[Tracker]: Use mm_features for M-RoPE calculation for all models #32656

Open

10 tasks

[Examples] Add Qwen3-Omni offline inference examples

cf0269b

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

mergify bot added the documentation Improvements or additions to documentation label Jan 25, 2026

Add tensor parallel and memory config args to qwen3_omni example

5a30837

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

DarkLight1337 approved these changes Jan 26, 2026

View reviewed changes

Isotr0py approved these changes Jan 26, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) January 26, 2026 06:46

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 26, 2026

Merge branch 'main' into etelis/qwen3-omni-mrope-optimization

8432039

Isotr0py merged commit 6ca2c91 into vllm-project:main Jan 26, 2026
58 checks passed

apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026

[Model] Use mm_position to compute mrope positions for Qwen3-Omni (vl…

42958c6

…lm-project#33010) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Model] Use mm_position to compute mrope positions for Qwen3-Omni (vl…

a3b7b0b

…lm-project#33010) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Use mm_position to compute mrope positions for Qwen3-Omni#33010

[Model] Use mm_position to compute mrope positions for Qwen3-Omni#33010
Isotr0py merged 6 commits intovllm-project:mainfrom
Etelis:etelis/qwen3-omni-mrope-optimization

Etelis commented Jan 24, 2026 •

edited

Loading

Uh oh!

Etelis commented Jan 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 24, 2026

Uh oh!

Etelis commented Jan 25, 2026

Uh oh!

DarkLight1337 commented Jan 25, 2026

Uh oh!

mergify bot commented Jan 25, 2026

Uh oh!

Etelis commented Jan 25, 2026 •

edited

Loading

Uh oh!

DarkLight1337 left a comment

Uh oh!

Isotr0py left a comment

Uh oh!

Etelis commented Jan 26, 2026

Uh oh!

Uh oh!

DarkLight1337 commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Etelis commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related

Test Plan

Uh oh!

Etelis commented Jan 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Etelis commented Jan 25, 2026

Uh oh!

DarkLight1337 commented Jan 25, 2026

Uh oh!

mergify bot commented Jan 25, 2026

Uh oh!

Etelis commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Etelis commented Jan 26, 2026

Uh oh!

Uh oh!

DarkLight1337 commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Etelis commented Jan 24, 2026 •

edited

Loading

Etelis commented Jan 25, 2026 •

edited

Loading