Skip to content

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML#2940

Merged
lishunyang12 merged 4 commits into
vllm-project:mainfrom
xiaohajiayou:bugfix/qwen3-tts-subtalker-sampling
Apr 21, 2026
Merged

[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML#2940
lishunyang12 merged 4 commits into
vllm-project:mainfrom
xiaohajiayou:bugfix/qwen3-tts-subtalker-sampling

Conversation

@xiaohajiayou

@xiaohajiayou xiaohajiayou commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fix #2942
In some TTS models' talker stage, sampling happens both in the main LLM decoding path and a separate talker fast-path that generates discrete audio codes for the downstream code2wav stage.

  • Previously, for Qwen3-TTS, the deploy-time YAML sampling config only applied to the main LLM sampling path. The residual code predictor path inside talker_mtp still used hard-coded sampling values.
  • This could be misleading, because Qwen3-TTS officially distinguishes the main talker sampling path and the subtalker/code predictor sampling path, while the YAML sampling config appeared to apply to the whole talker stage but in practice only affected the main LLM path and not the subtalker/code predictor path.

This PR adds stage-level subtalker_sampling_params support for Qwen3-TTS and wires those parameters into the talker MTP/code predictor path.

Changes

  • Add subtalker_sampling_params to stage deploy/config plumbing
  • Pass subtalker_sampling_params through StageDeployConfig -> OmniEngineArgs -> OmniModelConfig
  • Update Qwen3-TTS talker_mtp() to use configured subtalker sampling params instead of hard-coded values
  • Update GPU/NPU model runners to pass subtalker sampling params into talker_mtp()
  • Add subtalker_sampling_params defaults to deploy/qwen3_tts.yaml
  • Make Fish Speech and Qwen3-Omni talker_mtp() implementations accept extra kwargs for runner compatibility

This change makes the Qwen3-TTS subtalker sampling path configurable from deploy YAML and aligns the runtime behavior more closely with the model's official subtalker configuration semantics.

The compatibility changes for Fish Speech and Qwen3-Omni are only to ensure the updated runner call path does not break those models.

  • Those models currently still use hard-coded sampling values in their talker fast-path implementations as well.
  • This PR does not change that behavior; it only keeps them compatible with the updated runner call signature.
  • Whether they should also gain model-specific configurable fast-path sampling is left for follow-up discussion.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from ad86a93 to 8e64c7e Compare April 20, 2026 09:10

@hsliuustc0106 hsliuustc0106 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKING:

  • Test Coverage — Missing regression test for this bugfix. Please add a test that verifies subtalker_sampling_params are correctly passed from YAML through to talker_mtp() and used instead of hardcoded values.

@Gaohan123 Gaohan123 added this to the v0.20.0 milestone Apr 20, 2026
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 84e667a to 3a95d2c Compare April 20, 2026 13:48
@xiaohajiayou

xiaohajiayou commented Apr 20, 2026

Copy link
Copy Markdown
Contributor Author

Added two tests for this.

  • One covers the config deep-merge path for subtalker_sampling_params
  • and the other covers passing those params from OmniGPUModelRunner into talker_mtp().
    Both tests pass locally.

@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Apr 20, 2026
Gaohan123
Gaohan123 previously approved these changes Apr 20, 2026

@Gaohan123 Gaohan123 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 dismissed their stale review April 20, 2026 15:13

unresolved

@Gaohan123 Gaohan123 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 enabled auto-merge (squash) April 20, 2026 15:15
auto-merge was automatically disabled April 20, 2026 16:29

Head branch was pushed to by a user without write access

@xiaohajiayou xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 9fbcded to 39dfe42 Compare April 20, 2026 16:29
@xiaohajiayou xiaohajiayou force-pushed the bugfix/qwen3-tts-subtalker-sampling branch from 39dfe42 to 3cebb96 Compare April 20, 2026 16:32
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou

xiaohajiayou commented Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

The previous test component was missing the corresponding subtalker sampling parameters, which caused the CI to fail. This has been fixed in 6f5a157. Could you please take another look to see if we can merge this PR?
@Gaohan123 @hsliuustc0106

@lishunyang12 lishunyang12 merged commit ad7c966 into vllm-project:main Apr 21, 2026
8 checks passed
@lishunyang12

Copy link
Copy Markdown
Collaborator

cc @linyueqian

@linyueqian

Copy link
Copy Markdown
Collaborator

Late pass after merge, fix looks fine. A couple of things for a follow-up if you have time:

  1. The defaults 0.9 / 50 / 1.0 / True now live in three places: qwen3_tts.yaml, the cached dict in __init__, and the get(..., 0.9) fallbacks inside talker_mtp. Worth collapsing to one source so they don't drift.

  2. gpu_model_runner.py and the NPU runner check isinstance(..., dict) while the model checks isinstance(..., Mapping). Should align, prefer Mapping.

  3. No test asserts the YAML value actually reaches code_predictor. A small loader-level round-trip check would lock the contract.

  4. Fish Speech and Qwen3-Omni still hardcode sampling under the new **kwargs shim. Worth a tracking issue so it doesn't get forgotten.

@xiaohajiayou any chance you'd have time to pick up a follow-up PR for these? Happy to help review.

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026
quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Qwen3-TTS deploy YAML does not have expose subtalker sampling config

5 participants