[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL by Sy0307 · Pull Request #1731 · vllm-project/vllm-omni

Sy0307 · 2026-03-08T17:56:33Z

Purpose

Fix noisy first-chunk audio for Qwen3-TTS Base ICL in the multi-stage pipeline.

The official offline Qwen3-TTS Base voice-cloning path decodes ref_code + generated_codes and trims the reference prefix from the final waveform. That gives Code2Wav the same acoustic prefix context used by ICL prompt construction.

The multi-stage pipeline was not preserving that behavior:

Talker used ref_code when building the Base ICL prompt
but Stage-1 only received generated audio_codes
so Code2Wav decoded the first chunk without the reference codec prefix context

This showed up as noisy / unstable audio at the beginning of Base ICL outputs. x_vector_only_mode=True was unaffected because that mode only conditions on speaker embedding and does not rely on ref_code as decoder-side prefix context.

This PR restores the missing decoder context by:

preserving ref_code in the talker runtime/intermediate output for Base ICL
caching ref_code at request scope until the first Code2Wav chunk is emitted
prepending ref_code to the first Code2Wav input window
setting trim context so the prepended reference portion is removed from the final audio
applying the same fix to both async-chunk and non-async paths

Implementation note:

the async path now follows the same request-scoped state pattern used by qwen3_omni
instead of relying on request-side CPU side channels, the processor stores ref_code in transfer_manager.request_payload[request_id] until the first chunk is actually emitted

Root Cause Analysis

This PR follows up on the root-cause discussion in PR #1719 's comment. And thanks @iancarrasco-b10 :)

The issue was not caused by WebSocket transport or async chunk scheduling itself. The underlying problem was that Base ICL lost the decoder-side reference codec prefix when going through the multi-stage pipeline:

the talker still used ref_code to build the Base ICL prompt
but the downstream Code2Wav stage only received generated audio_codes
therefore the first decoded chunk no longer had the same acoustic prefix context as the official offline path

That mismatch explains why:

Base ICL showed noisy / unstable audio at the beginning
x_vector_only_mode=True did not show the same issue, because it only uses speaker embedding and does not require ref_code to be prepended before Code2Wav decoding
async chunking still needs request-scoped buffering, because the step where ref_code first appears is not guaranteed to be the same step where the first chunk is flushed

Test Plan

Run targeted stage input processor tests:

python -m pytest tests/model_executor/stage_input_processors/test_qwen3_tts_async_chunk.py -q

Test Result

Passed locally on this branch:

21 passed

PTAK @linyueqian

Signed-off-by: Sy03 <1370724210@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4caa8692a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-08T18:01:43Z

+        if any(ref_code.numel() > 0 for ref_code in ref_code_prompt_list):
+            mm["ref_code"] = ref_code_prompt_list


Keep ref_code per request instead of list broadcasting

make_omni_output now writes ref_code as a Python list, but the AR runner path that builds per-request payloads treats list-valued multimodal outputs as v[0] for every request (gpu_ar_model_runner.py), so concurrent Base ICL requests will all receive the first request’s ref_code and decode with the wrong prompt context. This silently corrupts speaker/context conditioning for batched requests and was introduced by adding mm["ref_code"] as a list here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-08T18:01:43Z

        codec_codes = audio_codes.transpose(0, 1).cpu().reshape(-1).tolist()
+        additional_information = None
+        if ref_code_len > 0:
+            additional_information = {"left_context_size": [ref_code_len]}


Pass scalar left_context_size for Code2Wav trimming

The non-async processor stores trim context as {"left_context_size": [ref_code_len]} (a list), but qwen3_tts_code2wav.py consumes left_context_size as an integer and later compares/multiplies it (if ctx_frames > 0, cut = ctx_frames * upsample). With this list payload, Base ICL non-async requests can hit a runtime type error during decode instead of producing audio.

Useful? React with 👍 / 👎.

hsliuustc0106

Review Summary

Well-structured bugfix with good test coverage (+115 lines tests for +58 lines production code).

What's good:

Both async and non-async paths properly handled
CPU detach pattern correctly used for state storage
Tests cover edge cases (buffering before first emit, only-first-chunk behavior)
Return type change is safe (single caller updated in same PR)

One observation (not blocking):

The dictionary pattern at dynamically adds an attribute to . This works given the request-scoped lifecycle, but consider documenting this contract or adding as an explicit attribute on the transfer manager class for type safety.

🤖 Reviewed with vllm-omni-review skill

linyueqian · 2026-03-09T01:48:31Z

Local testing results (Base ICL, async_chunk mode)

Tested both the PR branch and upstream/main with Qwen/Qwen3-TTS-12Hz-1.7B-Base, using the default qwen3_tts.yaml stage config and the official reference audio (clone_2.wav).

Setup:

Server: vllm-omni serve Qwen/Qwen3-TTS-12Hz-1.7B-Base with default async_chunk config
Reference audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav
Synthesis text: "Good one. Okay, fine, I'm just gonna leave this sock monkey here. Goodbye."

Unit tests: All 21 tests in test_qwen3_tts_async_chunk.py pass.

Audio quality: Both the PR branch and the baseline (main) still produce noisy audio at the beginning of Base ICL output. The first-chunk noise issue does not appear to be resolved by this change.

Note: The first request on both servers generated ~318s of audio (hit max_tokens=4096 without EOS - likely a warmup/compilation issue). The second request produced normal-length (~5s) audio, which was used for comparison. Both had audible noise.
baseline_base_icl_2.wav
fix_base_icl_2.wav

linyueqian · 2026-03-09T01:53:21Z

Reproduction steps:

# Start server (GPU 1, async_chunk mode)
VLLM_WORKER_MULTIPROC_METHOD=spawn CUDA_VISIBLE_DEVICES=1 \
  vllm-omni serve Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
  --host 0.0.0.0 --port 8092 --trust-remote-code --omni

# Send a warmup request first (first request hits max_tokens without EOS)
# Then send the actual test request:
curl -s http://localhost:8092/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    "input": "Good one. Okay, fine, I'm just gonna leave this sock monkey here. Goodbye.",
    "voice": "alloy",
    "response_format": "wav",
    "task_type": "Base",
    "ref_audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav",
    "ref_text": "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
  }' -o test_base_icl.wav

Sy0307 · 2026-03-09T03:53:57Z

Confirmed there is an issue here. Previously, due to some refactoring I did after testing was completed, it seems that the current ref code transmission is still causing problems. I will fix the issue here again. Thanks @linyueqian

Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 · 2026-03-09T12:34:15Z

Please re-check and I verified that the latest version can work well on my desktop. @linyueqian Thanks.

linyueqian · 2026-03-09T16:10:34Z

Retested on latest commit (c83c975). Unit tests all pass (21/21). Served Base ICL with async_chunk using same setup as before and the noisy first chunk issue is gone. There's a tiny glitch on the first real request after warmup but that's likely just compilation, subsequent requests sound clean. LGTM.

fix_base_icl_pr1731.wav
fix_base_icl_pr1731_3rd.wav

…-project#1731) Signed-off-by: lishunyang <lishunyang12@163.com>

…sync Base path talker2code2wav() wraps ref_code_len in a list when setting additional_information["left_context_size"], but the consumer in Qwen3TTSCode2Wav.forward() expects a plain int (line 287: "if ctx_frames > 0"). This causes a TypeError when the non-async Base path is used with max_model_len large enough to accept the prompt. The bug was introduced in PR vllm-project#1731 (761eff9, "Fix Base voice clone streaming quality and stop-token crash") which added ref_code support to the non-async path. The async chunk path in the same PR correctly passes left_context_size as a plain int. The bug was masked by the token overflow crash (max_model_len=32768 < prompt tokens) which prevented the code from reaching the comparison. Fixes: vllm-project#2030 Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>

…av path Qwen3TTSCode2Wav.forward() compares ctx_frames against 0 (line 287: "if ctx_frames > 0"), but the non-async Base path passes left_context_size as a single-element list [ref_code_len] to survive serialize_additional_information(), which only supports tensor and list values (plain ints are dropped). The async chunk path bypasses serialization and passes a plain int directly. The list wrapper in talker2code2wav() is intentional — without it the serializer drops the key and ctx_frames silently falls back to 0, causing ref_code context to never be trimmed from the output audio. Fix the consumer (Qwen3TTSCode2Wav.forward) to unwrap the list when present, handling both the serialized list form (non-async) and the plain int form (async chunk path). The bug was introduced in PR vllm-project#1731 (761eff9) which added ref_code support to the non-async path but did not account for the type mismatch between serialized list and the int comparison downstream. It was masked by the token overflow crash (max_model_len=32768 < prompt tokens) which prevented the code from reaching the comparison. Fixes: vllm-project#2030 Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>

…-project#1731)

[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL

e4caa86

Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 requested a review from hsliuustc0106 as a code owner March 8, 2026 17:56

chatgpt-codex-connector Bot reviewed Mar 8, 2026

View reviewed changes

This was referenced Mar 8, 2026

[Feat][Qwen3-TTS] Support streaming audio output for websocket #1719

Merged

[Bug]: QWEN3-TTS: noise insertion at the very front of voice #1707

Closed

hsliuustc0106 approved these changes Mar 9, 2026

View reviewed changes

Fix Base ICL ref_code context handling

c83c975

Signed-off-by: Sy03 <1370724210@qq.com>

Sy0307 mentioned this pull request Mar 9, 2026

[Bug]: In release version 0.16.0, voice cloning is not generating proper voice for Qwen3-TTS model #1754

Closed

1 task

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 9, 2026

Merge branch 'main' into fix/qwen3-tts-base-icl-refcode-context

9a84e47

linyueqian merged commit 761eff9 into vllm-project:main Mar 9, 2026
6 of 7 checks passed

Sy0307 mentioned this pull request Mar 10, 2026

[Bug]: vllm-omni-0.17.0rc1 cloned audio file is white noise in Qwen/Qwen3-TTS-12Hz-1.7B-Base #1774

Closed

1 task

linyueqian mentioned this pull request Mar 10, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

lishunyang12 pushed a commit to lishunyang12/vllm-omni that referenced this pull request Mar 11, 2026

[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL (vllm…

0f90e8b

…-project#1731) Signed-off-by: lishunyang <lishunyang12@163.com>

linyueqian mentioned this pull request Mar 16, 2026

[Test] Add Qwen-tts test cases and unify the style of existing test cases #1911

Merged

5 tasks

NickCao mentioned this pull request Mar 20, 2026

[Bugfix] Fix left_context_size type mismatch in non-async Base Code2Wav path #2052

Closed

5 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL (vllm…

3eabda3

…-project#1731)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL#1731

[Fix][Qwen3-TTS] Preserve ref_code decoder context for Base ICL#1731
linyueqian merged 3 commits into
vllm-project:mainfrom
Sy0307:fix/qwen3-tts-base-icl-refcode-context

Sy0307 commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 8, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 8, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

linyueqian commented Mar 9, 2026 •

edited

Loading

Uh oh!

linyueqian commented Mar 9, 2026

Uh oh!

Sy0307 commented Mar 9, 2026

Uh oh!

Sy0307 commented Mar 9, 2026

Uh oh!

linyueqian commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if any(ref_code.numel() > 0 for ref_code in ref_code_prompt_list):
		mm["ref_code"] = ref_code_prompt_list

Conversation

Sy0307 commented Mar 8, 2026

Purpose

Root Cause Analysis

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

linyueqian commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian commented Mar 9, 2026

Uh oh!

Sy0307 commented Mar 9, 2026

Uh oh!

Sy0307 commented Mar 9, 2026

Uh oh!

linyueqian commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linyueqian commented Mar 9, 2026 •

edited

Loading

linyueqian commented Mar 9, 2026 •

edited

Loading