Skip to content

Qwen3-Omni][Bugfix] Replace vLLM fused layers with HF-compatible numerics in code predictor#2291

Merged
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
LJH-LBJ:fix-qwen3-omni-accurancy
Mar 28, 2026
Merged

Qwen3-Omni][Bugfix] Replace vLLM fused layers with HF-compatible numerics in code predictor#2291
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
LJH-LBJ:fix-qwen3-omni-accurancy

Conversation

@LJH-LBJ
Copy link
Copy Markdown
Contributor

@LJH-LBJ LJH-LBJ commented Mar 28, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve: #2286
Fixes audio quality degradation (noise after ~3000 tokens) in Qwen3-Omni's code predictor by replacing vLLM's fused kernels with plain PyTorch equivalents that match HuggingFace reference numerics.

This is the Qwen3-Omni counterpart of PR #2277 (which fixed the same issue for Qwen3-TTS). The Qwen3-Omni code predictor was written following PR #1617's approach and inherits the same precision bug.

Component Before (precision issue) After (HF-compatible)
RMSNorm vLLM RMSNorm (bf16 variance) Custom _RMSNorm (float32 variance)
RoPE vLLM get_rope (bf16 cos/sin) Custom _RotaryEmbedding (float32 cos/sin, torch.autocast(enabled=False))
QKV projection QKVParallelLinear (fused) Separate nn.Linear for q/k/v
MLP MergedColumnParallelLinear + RowParallelLinear Separate nn.Linear for gate/up/down
torch.compile mode="default" options={"epilogue_fusion": False}
position_ids [bsz * seq_len] 1D flat [bsz, seq_len] 2D (HF format)
load_weights stacked_params_mapping remapping Direct loading (weight names match HF)

Test Plan

curl -s http://localhost:46354/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "/workspace/models/Qwen3-Omni-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Please write a 5000-word novel."}],
"modalities": ["audio"]
}' | jq -r '.choices[0].message.audio.data' | base64 -d > output2.wav

Test Result

There is no noise in the audio. It cannot upload because its size is too big


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

LJH-LBJ and others added 2 commits March 28, 2026 12:14
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
@LJH-LBJ LJH-LBJ requested a review from hsliuustc0106 as a code owner March 28, 2026 04:29
@amy-why-3459
Copy link
Copy Markdown
Contributor

Please add a nightly-test label. @Gaohan123 @david6666666 @gcanlin @hsliuustc0106

@david6666666 david6666666 added the nightly-test label to trigger buildkite nightly test CI label Mar 28, 2026
@amy-why-3459
Copy link
Copy Markdown
Contributor

We are working on an accuracy benchmark. Could you add a long-output use case to monitor this scenario?

@amy-why-3459
Copy link
Copy Markdown
Contributor

@Sy0307 PTAL

@LJH-LBJ
Copy link
Copy Markdown
Contributor Author

LJH-LBJ commented Mar 28, 2026

We are working on an accuracy benchmark. Could you add a long-output use case to monitor this scenario?

Sure, I can do it in next pr.

@Sy0307
Copy link
Copy Markdown
Contributor

Sy0307 commented Mar 28, 2026

LGTM.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Mar 28, 2026
@hsliuustc0106 hsliuustc0106 merged commit a3e1322 into vllm-project:main Mar 28, 2026
7 of 8 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…rics in code predictor (vllm-project#2291)

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
…rics in code predictor (vllm-project#2291)

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…rics in code predictor (vllm-project#2291)

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: The Qwen3-Omni outputs extremely long audio frequencies, resulting in decreased accuracy.

5 participants