[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML by xiaohajiayou · Pull Request #2940 · vllm-project/vllm-omni

xiaohajiayou · 2026-04-20T09:09:50Z

Purpose

Fix #2942
In some TTS models' talker stage, sampling happens both in the main LLM decoding path and a separate talker fast-path that generates discrete audio codes for the downstream code2wav stage.

Previously, for Qwen3-TTS, the deploy-time YAML sampling config only applied to the main LLM sampling path. The residual code predictor path inside talker_mtp still used hard-coded sampling values.
This could be misleading, because Qwen3-TTS officially distinguishes the main talker sampling path and the subtalker/code predictor sampling path, while the YAML sampling config appeared to apply to the whole talker stage but in practice only affected the main LLM path and not the subtalker/code predictor path.

This PR adds stage-level subtalker_sampling_params support for Qwen3-TTS and wires those parameters into the talker MTP/code predictor path.

Changes

Add subtalker_sampling_params to stage deploy/config plumbing
Pass subtalker_sampling_params through StageDeployConfig -> OmniEngineArgs -> OmniModelConfig
Update Qwen3-TTS talker_mtp() to use configured subtalker sampling params instead of hard-coded values
Update GPU/NPU model runners to pass subtalker sampling params into talker_mtp()
Add subtalker_sampling_params defaults to deploy/qwen3_tts.yaml
Make Fish Speech and Qwen3-Omni talker_mtp() implementations accept extra kwargs for runner compatibility

This change makes the Qwen3-TTS subtalker sampling path configurable from deploy YAML and aligns the runtime behavior more closely with the model's official subtalker configuration semantics.

The compatibility changes for Fish Speech and Qwen3-Omni are only to ensure the updated runner call path does not break those models.

Those models currently still use hard-coded sampling values in their talker fast-path implementations as well.
This PR does not change that behavior; it only keeps them compatible with the updated runner call signature.
Whether they should also gain model-specific configurable fast-path sampling is left for follow-up discussion.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-04-20T09:09:56Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: xiaohajiayou <923390377@qq.com>

hsliuustc0106

BLOCKING:

Test Coverage — Missing regression test for this bugfix. Please add a test that verifies subtalker_sampling_params are correctly passed from YAML through to talker_mtp() and used instead of hardcoded values.

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou · 2026-04-20T13:51:04Z

Added two tests for this.

One covers the config deep-merge path for subtalker_sampling_params
and the other covers passing those params from OmniGPUModelRunner into talker_mtp().
Both tests pass locally.

Gaohan123

LGTM. Thanks

unresolved

Gaohan123

LGTM. Thanks

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou · 2026-04-21T01:30:14Z

The previous test component was missing the corresponding subtalker sampling parameters, which caused the CI to fail. This has been fixed in 6f5a157. Could you please take another look to see if we can merge this PR?
@Gaohan123 @hsliuustc0106

lishunyang12 · 2026-04-21T06:49:52Z

cc @linyueqian

linyueqian · 2026-04-21T21:12:11Z

Late pass after merge, fix looks fine. A couple of things for a follow-up if you have time:

The defaults 0.9 / 50 / 1.0 / True now live in three places: qwen3_tts.yaml, the cached dict in __init__, and the get(..., 0.9) fallbacks inside talker_mtp. Worth collapsing to one source so they don't drift.
gpu_model_runner.py and the NPU runner check isinstance(..., dict) while the model checks isinstance(..., Mapping). Should align, prefer Mapping.
No test asserts the YAML value actually reaches code_predictor. A small loader-level round-trip check would lock the contract.
Fish Speech and Qwen3-Omni still hardcode sampling under the new **kwargs shim. Worth a tracking issue so it doesn't get forgotten.

@xiaohajiayou any chance you'd have time to pick up a follow-up PR for these? Happy to help review.

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou requested a review from hsliuustc0106 as a code owner April 20, 2026 09:09

[BugFix] wire Qwen3-TTS subtalker sampling params

8e64c7e

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from ad86a93 to 8e64c7e Compare April 20, 2026 09:10

hsliuustc0106 reviewed Apr 20, 2026

View reviewed changes

Gaohan123 added this to the v0.20.0 milestone Apr 20, 2026

Add Qwen3-TTS subtalker sampling tests

3a95d2c

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 84e667a to 3a95d2c Compare April 20, 2026 13:48

Gaohan123 added the ready label to trigger buildkite CI label Apr 20, 2026

Gaohan123 previously approved these changes Apr 20, 2026

View reviewed changes

Gaohan123 approved these changes Apr 20, 2026

View reviewed changes

Gaohan123 enabled auto-merge (squash) April 20, 2026 15:15

auto-merge was automatically disabled April 20, 2026 16:29
Head branch was pushed to by a user without write access

xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 9fbcded to 39dfe42 Compare April 20, 2026 16:29

Merge branch 'main' into bugfix/qwen3-tts-subtalker-sampling

3cebb96

xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 39dfe42 to 3cebb96 Compare April 20, 2026 16:32

[tests] fix talker mtp cpu stub signature

6f5a157

Signed-off-by: xiaohajiayou <923390377@qq.com>

lishunyang12 merged commit ad7c966 into vllm-project:main Apr 21, 2026
8 checks passed

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…

03d109d

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…

1a6bae2

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

linyueqian mentioned this pull request May 1, 2026

[Bugfix] Map Qwen3-TTS max_new_tokens to max_tokens #3217

Merged

5 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…

a71ed25

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…

1044ef0

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…

8a3f568

…ML (vllm-project#2940) Signed-off-by: xiaohajiayou <923390377@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML#2940

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML#2940
lishunyang12 merged 4 commits into
vllm-project:mainfrom
xiaohajiayou:bugfix/qwen3-tts-subtalker-sampling

xiaohajiayou commented Apr 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

xiaohajiayou commented Apr 20, 2026 •

edited

Loading

Uh oh!

Gaohan123 left a comment

Uh oh!

Gaohan123 left a comment

Uh oh!

xiaohajiayou commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

linyueqian commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xiaohajiayou commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

linyueqian commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xiaohajiayou commented Apr 20, 2026 •

edited

Loading

xiaohajiayou commented Apr 20, 2026 •

edited

Loading

xiaohajiayou commented Apr 21, 2026 •

edited

Loading