[Bugfix][Core] Fix use audio in video bug by xsank · Pull Request #32994 · vllm-project/vllm

xsank · 2026-01-24T04:00:08Z

If use the audio in the video, the mask of the audio is not continuous.

The current implemention would mix the mm feature leads to the bug. Use the mask list would be better.

before:

{
    "综合描述": "视频中，一个穿着深色服装的人在昏暗的室内环境中，手持一个带有蓝色图案的白色物体，似乎在进行某种仪式或表演。背景是深色的，可能是一个舞台或舞台，有蓝色的灯光。这个人似乎在移动，可能在移动或移动，可能在移动或移动。背景中可以听到模糊的音乐或音乐，以及一些模糊的背景噪音。这个人可能在说话，但语音内容无法辨认。",
    "简要描述": "一个人在昏暗的舞台上手持一个带有蓝色图案的白色物体，背景有模糊的音乐。"
}

after fix:

{
    "综合描述": "在一个类似剧院的舞台上，一名穿着粉色连衣裙的女性和一名穿着蓝色西装的男性在橙色地毯上跳舞。他们从舞台中央向两侧移动，女性向左，男性向右。与此同时，另一名穿着绿色连衣裙的女性和一名穿着棕色夹克的男性从舞台左侧跑向右侧，经过跳舞的两人。在舞台的右侧，一名穿着灰色西装的男性手持红色文件夹站立不动。背景中，一个金色的王座位于舞台后方，王座上方悬挂着一幅画，画中是一对男女。观众坐在舞台前方的座位上观看表演。音频中可以听到一名男性用戏剧化的语气说：“The dale still has been the father of good news.” 另一名男性回应：“Am I, my lord?” 第三名男性说：“I assure my liege, I hold.”",
    "简要描述": "舞台上，一男一女在跳舞，另一对男女从旁跑过，一名男子手持红色文件夹站立。背景有王座和画像。音频中一名男性说：“The dale still has been the father of good news.”"
}

But this change should modify the interface _gather_mm_embeddingsand the interface embed_input_ids, a lot of the models code should be modified, i'm not sure should i do it now.

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

gemini-code-assist

Code Review

This pull request aims to fix a bug in how audio-in-video features are handled in Qwen3-Omni models by using separate masks for different multimodal inputs instead of a single combined mask. The approach of separating the masks is sound and correctly addresses the issue of discontinuous audio masks. However, the implementation introduces a critical bug that can cause a crash and a significant breaking change to the multimodal interface that affects other models in the codebase. I've provided detailed comments on these two critical issues with suggestions for how to resolve them.

vllm/model_executor/models/qwen3_omni_moe_thinker.py

gemini-code-assist · 2026-01-24T04:03:00Z

vllm/v1/worker/gpu_model_runner.py

            inputs_embeds_scheduled = self.model.embed_input_ids(
                self.input_ids.gpu[:num_scheduled_tokens],
                multimodal_embeddings=mm_embeds,
-                is_multimodal=is_mm_embed,
+                is_multimodals=is_mm_embeds,
            )


This change, which passes is_multimodals to embed_input_ids, introduces a breaking change to the SupportsMultiModal interface. Currently, only Qwen3OmniMoeThinkerForConditionalGeneration is updated to handle this new parameter. Other multimodal models in the codebase that expect is_multimodal: torch.Tensor will fail at runtime.

To address this, you could either:

Update all other multimodal models to accept the is_multimodals parameter.

Implement a backward-compatibility mechanism. For instance, you could inspect the signature of self.model.embed_input_ids and, if it doesn't accept is_multimodals, compute a single combined mask using reduce(torch.logical_or, is_mm_embeds) and pass it as is_multimodal.

As the critical message, this commit should modify the interface _gather_mm_embeddingsand the interface embed_input_ids, a lot of the models code should be modified, i'm not sure should i do it now.

mergify · 2026-01-24T04:04:48Z

Hi @xsank, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-01-24T04:12:18Z

vllm/v1/worker/gpu_model_runner.py

                self.input_ids.gpu[:num_scheduled_tokens],
                multimodal_embeddings=mm_embeds,
-                is_multimodal=is_mm_embed,
+                is_multimodals=is_mm_embeds,


Interface change breaks other multimodal models

High Severity

The parameter name in gpu_model_runner.py changed from is_multimodal to is_multimodals, but only qwen3_omni_moe_thinker.py was updated to accept the new parameter name. Other multimodal models (e.g., clip.py, eagle2_5_vl.py, gemma3_mm.py, ernie45_vl.py, qwen2_5_omni_thinker.py) still expect is_multimodal (singular). When the runner calls embed_input_ids(is_multimodals=...) on these models, a TypeError will be raised for unexpected keyword argument.

Additional Locations (1)

vllm/model_executor/models/qwen3_omni_moe_thinker.py#L1771-L1772

vllm/model_executor/models/qwen3_omni_moe_thinker.py

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

mergify · 2026-01-24T08:51:15Z

Hi @xsank, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

ywang96 · 2026-01-24T09:11:11Z

Very much appreciate the bugfix! We're aware of this bug and will prioritize reviewing this!

mergify · 2026-01-27T15:38:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xsank.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

xsank · 2026-01-28T02:01:12Z

It seems that @Etelis ‘s PR compatible with this case in another way.

ywang96 · 2026-02-04T03:55:35Z

Sorry for the late reply - I think we still need this PR to fix the actual issue itself, but I personally prefer #33605 over this since that PR is more model-specific.

fix use audio in video bug

019c549

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

xsank requested a review from sighingnow as a code owner January 24, 2026 04:00

mergify bot added qwen Related to Qwen models v1 bug Something isn't working labels Jan 24, 2026

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

cursor bot reviewed Jan 24, 2026

View reviewed changes

DarkLight1337 requested review from Isotr0py and ywang96 January 24, 2026 04:54

fix pre commit

96e528f

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

fix pre commit

36126e8

Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>

ywang96 self-assigned this Jan 24, 2026

DarkLight1337 mentioned this pull request Jan 24, 2026

[Tracker]: Use mm_features for M-RoPE calculation for all models #32656

Open

10 tasks

Etelis mentioned this pull request Jan 24, 2026

[Model] Use mm_position to compute mrope positions for Qwen3-Omni #33010

Merged

6 tasks

This was referenced Jan 27, 2026

[Bug]: Qwen3 Omni thinking unstable output #29174

Open

[Bug]: 新版的vllm已经废弃了v0代码，而对qwen-omni系列的模型支持仅限于v0，似乎是因为这个原因，我们无法使用最新版的vllm推理qwen-omni模型 #28388

Open

mergify bot added the needs-rebase label Jan 27, 2026

xsank closed this Jan 28, 2026

linyueqian mentioned this pull request Feb 2, 2026

[Bugfix][Model] Fix audio-in-video support for Qwen2.5-Omni and Qwen3-Omni #33605

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Core] Fix use audio in video bug#32994

[Bugfix][Core] Fix use audio in video bug#32994
xsank wants to merge 3 commits intovllm-project:mainfrom
xsank:main

xsank commented Jan 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 24, 2026

Uh oh!

xsank Jan 24, 2026

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 24, 2026

Uh oh!

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

ywang96 commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 27, 2026

Uh oh!

xsank commented Jan 28, 2026

Uh oh!

ywang96 commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xsank commented Jan 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

xsank Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 24, 2026

Choose a reason for hiding this comment

Interface change breaks other multimodal models

Uh oh!

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

ywang96 commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 27, 2026

Uh oh!

xsank commented Jan 28, 2026

Uh oh!

ywang96 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xsank commented Jan 24, 2026 •

edited by github-actions bot

Loading

ywang96 commented Feb 4, 2026 •

edited

Loading