[Feat] Add benchmarks for Qwen3-TTS Base/VoiceDesign Model by JasonJ2021 · Pull Request #2411 · vllm-project/vllm-omni

JasonJ2021 · 2026-04-01T08:47:28Z

Purpose

This PR fixes #2348
Fix Qwen3-TTS benchmark scripts so Base and VoiceDesign models can be benchmarked correctly.

Previously, bench_tts_serve.py always sent requests as if the task type were CustomVoice, which caused benchmarking errors for Base and VoiceDesign models. This PR adds a --task-type argument to the benchmark scripts and propagates it through the benchmarking pipeline so the request payload matches the actual model type.

This PR includes:

adding --task-type to bench_tts_serve.py
adding --task-type to bench_tts_hf.py
wiring TASK_TYPE through run_benchmark.sh
constructing task-specific payloads for:
- CustomVoice
- Base
- VoiceDesign
updating the benchmark README with an example for non-CustomVoice models

Test Plan

Validated both vLLM-Omni serve benchmarking and HF benchmarking paths with Base task type.

Commands used:

MODEL=Qwen/Qwen3-TTS-12Hz-1.7B-Base TASK_TYPE=Base bash benchmarks/qwen3-tts/run_benchmark.sh --async-only

MODEL=Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign TASK_TYPE=VoiceDesign bash benchmarks/qwen3-tts/run_benchmark.sh --async-only

bash benchmarks/qwen3-tts/run_benchmark.sh --async-only

Test Result

The above benchmarks run as expected

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

Adds task-type awareness to the Qwen3-TTS benchmarking scripts so payloads match the selected model variant (CustomVoice / Base / VoiceDesign), fixing incorrect requests for non-CustomVoice models.

Changes:

Added a --task-type CLI flag to both serving and HF benchmark clients and propagated it through run_benchmark.sh.
Implemented task-type-specific request construction for /v1/audio/speech (serving) and task-type-specific generation method selection (HF).
Updated benchmark README/run script examples for benchmarking Base (voice cloning) models.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
benchmarks/qwen3-tts/vllm_omni/bench_tts_serve.py	Adds `--task-type` and builds task-specific serving payloads (Base/VoiceDesign/CustomVoice).
benchmarks/qwen3-tts/transformers/bench_tts_hf.py	Adds `--task-type` and routes to the correct HF generation method per task type.
benchmarks/qwen3-tts/run_benchmark.sh	Wires `TASK_TYPE` through to both benchmark entrypoints and documents it.
benchmarks/qwen3-tts/README.md	Adds an example command for benchmarking the Base model task type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T08:54:32Z

    parser.add_argument(  # noqa: E501
        "--max-concurrency", type=int, nargs="+", default=[1, 4, 10], help="Concurrency levels to test"
    )
    parser.add_argument("--num-warmups", type=int, default=3)
+    parser.add_argument("--task-type", type=str, default="CustomVoice", choices=["CustomVoice", "VoiceDesign", "Base"])
    parser.add_argument("--voice", type=str, default="vivian")


The new --task-type option is not captured anywhere in the saved JSON results (BenchmarkResult / per-request entries) or the output filename, so runs for different task types can’t be distinguished when comparing results. Consider recording task_type in the result payload (and optionally include it in the result filename).

Copilot · 2026-04-01T08:54:32Z

    parser.add_argument("--num-warmups", type=int, default=3)
    parser.add_argument("--gpu-device", type=int, default=0)
    parser.add_argument("--voice", type=str, default="Vivian")
    parser.add_argument("--language", type=str, default="English")
+    parser.add_argument("--task-type", type=str, default="CustomVoice", choices=["CustomVoice", "VoiceDesign", "Base"])
    parser.add_argument(


The new --task-type option is not included in the saved benchmark JSON (BenchmarkResult) or filename, which makes it hard to tell which task type a result corresponds to when collecting multiple runs. Consider adding task_type to the serialized result (and optionally the filename).

Copilot · 2026-04-01T08:54:32Z

+REF_AUDIO = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav"
+REF_TEXT = "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
+INSTRUCT = "Speak in an incredulous tone, but with a hint of panic beginning to creep into your voice."




REF_AUDIO/REF_TEXT/INSTRUCT are duplicated here and in the HF benchmark script; if either sample input needs to change later, the two benchmarks can drift. Consider centralizing these shared constants (or allowing them to be provided via CLI) to keep the benchmarking paths consistent.

Suggested change

REF_AUDIO = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav"

REF_TEXT = "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."

INSTRUCT = "Speak in an incredulous tone, but with a hint of panic beginning to creep into your voice."

@dataclass(frozen=True)

class ReferenceSample:

"""Shared reference sample used for TTS benchmarking."""

audio_url: str

text: str

instruct: str

# Centralized reference sample; other benchmark scripts should import this

# instead of duplicating the literals.

REFERENCE_SAMPLE = ReferenceSample(

audio_url="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav",

text=(

"Okay. Yeah. I resent you. I love you. I respect you. "

"But you know what? You blew it! And thanks to you."

),

instruct=(

"Speak in an incredulous tone, but with a hint of panic "

"beginning to creep into your voice."

),

)

# Backwards-compatible aliases used throughout this module.

REF_AUDIO = REFERENCE_SAMPLE.audio_url

REF_TEXT = REFERENCE_SAMPLE.text

INSTRUCT = REFERENCE_SAMPLE.instruct

Copilot · 2026-04-01T08:54:33Z

+REF_AUDIO = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav"
+REF_TEXT = "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
+INSTRUCT = "Speak in an incredulous tone, but with a hint of panic beginning to creep into your voice."
+


REF_AUDIO/REF_TEXT/INSTRUCT are duplicated here and in the serving benchmark script; this duplication can lead to drift between the HF and serving benchmark paths. Consider centralizing these shared constants (or allowing them to be provided via CLI) to keep both scripts aligned.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

linyueqian

LGTM

linyueqian · 2026-04-02T00:19:43Z

fix dco pls

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

JasonJ2021 · 2026-04-02T03:03:13Z

fix dco pls

fixed

…ect#2411) Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

JasonJ2021 requested a review from hsliuustc0106 as a code owner April 1, 2026 08:47

Copilot AI review requested due to automatic review settings April 1, 2026 08:47

Copilot started reviewing on behalf of JasonJ2021 April 1, 2026 08:48 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

hsliuustc0106 requested review from Copilot and linyueqian April 1, 2026 08:58

Copilot started reviewing on behalf of hsliuustc0106 April 1, 2026 09:02 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

linyueqian approved these changes Apr 2, 2026

View reviewed changes

JasonJ2021 and others added 3 commits April 2, 2026 10:59

add benchmark for voicedesign & base model

7d69320

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

refine

029bc2f

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

fix

33b9658

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

JasonJ2021 force-pushed the dev branch from 8209f5b to 33b9658 Compare April 2, 2026 03:02

linyueqian added the ready label to trigger buildkite CI label Apr 2, 2026

linyueqian enabled auto-merge (squash) April 2, 2026 03:23

linyueqian merged commit d3daafb into vllm-project:main Apr 2, 2026
7 of 8 checks passed

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[Feat] Add benchmarks for Qwen3-TTS Base/VoiceDesign Model (vllm-proj…

3b7246d

…ect#2411) Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add benchmarks for Qwen3-TTS Base/VoiceDesign Model#2411

[Feat] Add benchmarks for Qwen3-TTS Base/VoiceDesign Model#2411
linyueqian merged 3 commits intovllm-project:mainfrom
JasonJ2021:dev

JasonJ2021 commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

linyueqian left a comment

Uh oh!

linyueqian commented Apr 2, 2026

Uh oh!

JasonJ2021 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-REF_AUDIO = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav"
-REF_TEXT = "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
-INSTRUCT = "Speak in an incredulous tone, but with a hint of panic beginning to creep into your voice."
+@dataclass(frozen=True)
+class ReferenceSample:
+    """Shared reference sample used for TTS benchmarking."""
+    audio_url: str
+    text: str
+    instruct: str
+# Centralized reference sample; other benchmark scripts should import this
+# instead of duplicating the literals.
+REFERENCE_SAMPLE = ReferenceSample(
+    audio_url="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav",
+    text=(
+        "Okay. Yeah. I resent you. I love you. I respect you. "
+        "But you know what? You blew it! And thanks to you."
+    ),
+    instruct=(
+        "Speak in an incredulous tone, but with a hint of panic "
+        "beginning to creep into your voice."
+    ),
+)
+# Backwards-compatible aliases used throughout this module.
+REF_AUDIO = REFERENCE_SAMPLE.audio_url
+REF_TEXT = REFERENCE_SAMPLE.text
+INSTRUCT = REFERENCE_SAMPLE.instruct

Conversation

JasonJ2021 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 2, 2026

Uh oh!

JasonJ2021 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JasonJ2021 commented Apr 1, 2026 •

edited

Loading