Qwen3-Omni][Bugfix] Replace vLLM fused layers with HF-compatible numerics in code predictor by LJH-LBJ · Pull Request #2291 · vllm-project/vllm-omni

LJH-LBJ · 2026-03-28T04:29:13Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve: #2286
Fixes audio quality degradation (noise after ~3000 tokens) in Qwen3-Omni's code predictor by replacing vLLM's fused kernels with plain PyTorch equivalents that match HuggingFace reference numerics.

This is the Qwen3-Omni counterpart of PR #2277 (which fixed the same issue for Qwen3-TTS). The Qwen3-Omni code predictor was written following PR #1617's approach and inherits the same precision bug.

Component	Before (precision issue)	After (HF-compatible)
RMSNorm	vLLM `RMSNorm` (bf16 variance)	Custom `_RMSNorm` (float32 variance)
RoPE	vLLM `get_rope` (bf16 cos/sin)	Custom `_RotaryEmbedding` (float32 cos/sin, `torch.autocast(enabled=False)`)
QKV projection	`QKVParallelLinear` (fused)	Separate `nn.Linear` for q/k/v
MLP	`MergedColumnParallelLinear` + `RowParallelLinear`	Separate `nn.Linear` for gate/up/down
torch.compile	`mode="default"`	`options={"epilogue_fusion": False}`
position_ids	`[bsz * seq_len]` 1D flat	`[bsz, seq_len]` 2D (HF format)
load_weights	stacked_params_mapping remapping	Direct loading (weight names match HF)

Test Plan

curl -s http://localhost:46354/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "/workspace/models/Qwen3-Omni-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Please write a 5000-word novel."}],
"modalities": ["audio"]
}' | jq -r '.choices[0].message.audio.data' | base64 -d > output2.wav

Test Result

There is no noise in the audio. It cannot upload because its size is too big

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

amy-why-3459 · 2026-03-28T04:48:24Z

Please add a nightly-test label. @Gaohan123 @david6666666 @gcanlin @hsliuustc0106

amy-why-3459 · 2026-03-28T06:21:30Z

We are working on an accuracy benchmark. Could you add a long-output use case to monitor this scenario?

amy-why-3459 · 2026-03-28T06:22:15Z

@Sy0307 PTAL

LJH-LBJ · 2026-03-28T09:24:10Z

We are working on an accuracy benchmark. Could you add a long-output use case to monitor this scenario?

Sure, I can do it in next pr.

Sy0307 · 2026-03-28T12:08:45Z

LGTM.

…rics in code predictor (vllm-project#2291) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

LJH-LBJ and others added 2 commits March 28, 2026 12:14

use HF-compatible

653a2ac

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'vllm-project:main' into fix-qwen3-omni-accurancy

a842ad5

LJH-LBJ requested a review from hsliuustc0106 as a code owner March 28, 2026 04:29

david6666666 added the nightly-test label to trigger buildkite nightly test CI label Mar 28, 2026

Merge branch 'main' into fix-qwen3-omni-accurancy

c823b38

hsliuustc0106 approved these changes Mar 28, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 28, 2026

hsliuustc0106 merged commit a3e1322 into vllm-project:main Mar 28, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-Omni][Bugfix] Replace vLLM fused layers with HF-compatible numerics in code predictor#2291

Qwen3-Omni][Bugfix] Replace vLLM fused layers with HF-compatible numerics in code predictor#2291
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
LJH-LBJ:fix-qwen3-omni-accurancy

LJH-LBJ commented Mar 28, 2026 •

edited

Loading

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

LJH-LBJ commented Mar 28, 2026

Uh oh!

Sy0307 commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

LJH-LBJ commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

amy-why-3459 commented Mar 28, 2026

Uh oh!

LJH-LBJ commented Mar 28, 2026

Uh oh!

Sy0307 commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LJH-LBJ commented Mar 28, 2026 •

edited

Loading