[BugFix] qwen3_tts chunk boundary handling logic in initial chunk (IC) by Fattysand · Pull Request #2378 · vllm-project/vllm-omni

Fattysand · 2026-03-31T11:48:31Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix the initial chunk (IC) coverage logic in qwen3_tts.py to align with the correct behavior already implemented in fish_speech.py.

Currently, qwen3_tts.py uses < and chunk_size - 1 which constrains IC coverage to strictly less than chunk_size, while fish_speech.py uses <= and no -1, allowing IC to cover up to chunk_size. This mismatch causes qwen3_tts.py to miss the last IC chunk (e.g. cs=25, ic=5: IC emits at 5, 10, 15, 20 then jumps to normal phase emitting 21–45, skipping a 1–25 emit).

Proposed fix (only two lines changed):

# line 215: < → <=
in_initial_phase = initial_chunk_size > 0 and initial_chunk_size < chunk_size and length <= chunk_size

# lines 227-229: remove -1
initial_coverage = (
    (chunk_size // initial_chunk_size) * initial_chunk_size if 0 < initial_chunk_size < chunk_size else 0
)

Reference — fish_speech.py (lines 118 & 131):

in_initial_phase = initial_chunk_size > 0 and length <= chunk_size
initial_coverage = (chunk_size // initial_chunk_size) * initial_chunk_size if initial_chunk_size > 0 else 0

Test Plan

This is a minimal two-line logic fix aligning qwen3_tts.py with the existing fish_speech.py implementation. No additional test scripts are needed — the change is self-contained and the edge cases have been manually verified (see below).

Test Result

Edge cases verified:

Non-divisible (cs=25, ic=8): (24//8)*8 == (25//8)*8 == 24, behavior unchanged.
ic == chunk_size: Guarded by initial_chunk_size < chunk_size, IC skipped entirely — unaffected.
finished=True during IC: Handled by existing context_length logic.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

cc @Sy0307

Fattysand · 2026-03-31T13:23:11Z

Update `test_qwen3_tts_async_chunk.py` to match corrected IC boundary logic

The existing test cases for the "IC evenly divides chunk_size" edge case (ic=8, cs=16) were written under the assumption that IC phase uses strict < (i.e., length < chunk_size). This contradicts the <= boundary used in both fish_speech.py and the corrected qwen3_tts.py.

What changed in tests:

Fixed 2 incorrect expectations for ic=8, cs=16:

case	before	after	reason
`n=16, finished=False`	`None`	`(8, 16)`	`16<=16` → still IC phase, `16%8==0` → emit
`n=24, finished=False`	`(8, 24)`	`None`	normal phase, `adjusted=8, 8%16!=0` → hold

Added 1 normal-emit verification for ic=8, cs=16:
- n=32 → (16, 32): first normal emit at initial_coverage + chunk_size = 16+16 = 32
Added 5 new cases for ic=5, cs=25 (IC evenly divides chunk_size with higher multiplicity):
- Demonstrates IC filling the entire first chunk: emit at 5, 12→hold, 25→emit, 30→hold, 50→first normal emit
- Emit interval pattern: 5,5,5,5,5,25,25,... — smooth transition with no gap
- This is the key scenario that exposes the bug in the old < logic: with strict <, IC would only emit at 5,10,15,20 (skipping 25), then normal phase wouldn't emit until frame 45, creating a 25-frame gap (longer than the normal chunk itself)

Updated comments clarify the IC boundary rule:

# IC phase: length <= chunk_size  (uses <=, consistent with fish_speech)
# IC emits fill the entire first chunk_size worth of frames, so the
# normal phase always starts at a clean chunk boundary.
# initial_coverage = (chunk_size // initial_chunk_size) * initial_chunk_size

Signed-off-by: Fattysand <fattysand@users.noreply.github.com>

Sy0307 · 2026-03-31T18:13:20Z

LGTM. Nice catch.

linyueqian · 2026-03-31T22:03:34Z

@JuanPZuluaga please also take a look. thank you!

linyueqian

LGTM

Two bugs preventing Base (voice-clone) task from producing correct audio: 1. Speech tokenizer encoder ran in bfloat16, causing ~50% of encoded reference-audio codes to diverge from float32 baseline. The corrupted prompt prevents the talker from generating stop token 2150, producing ~318s of audio instead of ~8s. Fix: load encoder in float32. 2. Cherry-pick chunk boundary fix (vllm-project#2378): off-by-one in initial chunk phase boundary check caused the final codec chunk to be malformed (length 1, not divisible by 16 quantizers), resulting in 0-byte output even when stop token was correctly generated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

vllm-project#2378) Signed-off-by: Fattysand <fattysand@users.noreply.github.com>

Fattysand requested a review from hsliuustc0106 as a code owner March 31, 2026 11:48

Fattysand force-pushed the fix/qwen3-tts-chunk-boundary branch from da3e643 to fd4ae7e Compare March 31, 2026 11:50

tzhouam self-requested a review March 31, 2026 11:52

tzhouam added the ready label to trigger buildkite CI label Mar 31, 2026

tzhouam requested a review from linyueqian March 31, 2026 11:55

Fattysand added 2 commits March 31, 2026 21:26

fix qwen3_tts chunk boundary handling

be37ecf

Signed-off-by: Fattysand <fattysand@users.noreply.github.com>

test: update qwen3_tts async chunk test for IC boundary fix

35c3a94

Signed-off-by: Fattysand <fattysand@users.noreply.github.com>

Fattysand force-pushed the fix/qwen3-tts-chunk-boundary branch from b7204cc to 35c3a94 Compare March 31, 2026 13:27

linyueqian approved these changes Mar 31, 2026

View reviewed changes

tzhouam merged commit 7274e15 into vllm-project:main Apr 1, 2026
7 of 8 checks passed

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[BugFix] qwen3_tts chunk boundary handling logic in initial chunk (IC) (

77522a2

vllm-project#2378) Signed-off-by: Fattysand <fattysand@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] qwen3_tts chunk boundary handling logic in initial chunk (IC)#2378

[BugFix] qwen3_tts chunk boundary handling logic in initial chunk (IC)#2378
tzhouam merged 2 commits intovllm-project:mainfrom
Fattysand:fix/qwen3-tts-chunk-boundary

Fattysand commented Mar 31, 2026 •

edited

Loading

Uh oh!

Fattysand commented Mar 31, 2026

Uh oh!

Sy0307 commented Mar 31, 2026

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

linyueqian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Fattysand commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Fattysand commented Mar 31, 2026

Update test_qwen3_tts_async_chunk.py to match corrected IC boundary logic

Uh oh!

Sy0307 commented Mar 31, 2026

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fattysand commented Mar 31, 2026 •

edited

Loading

Update `test_qwen3_tts_async_chunk.py` to match corrected IC boundary logic