[Test] Add Qwen3-TTS nightly performance benchmark by linyueqian · Pull Request #1700 · vllm-project/vllm-omni

linyueqian · 2026-03-06T04:32:36Z

Summary

Add Qwen3-TTS performance benchmark to the nightly CI pipeline
Make backend/endpoint configurable in run_benchmark.py (was hardcoded to openai-chat-omni)
Add TTS stage config and test cases with concurrency [1, 4] matching Qwen3-Omni

Test Plan

Validated locally on H200: vllm bench serve --omni --backend openai-audio-speech with 3 prompts, concurrency 1
All 3 requests succeeded, RTF ~0.94, audio generation working end-to-end

hsliuustc0106

Review

Rating: 8.5/10 | Verdict: ✅ Approved

Summary

Good addition of Qwen3-TTS performance benchmark to nightly CI pipeline. Follows established patterns from Qwen3-Omni benchmark.

Highlights

✅ Makes backend/endpoint configurable (was hardcoded)
✅ Adds TTS stage config with concurrency [1, 4]
✅ Validated locally on H200
✅ RTF ~0.94 demonstrates good performance

Minor Suggestions

Consider adding more concurrency levels (e.g., [1, 4, 8]) for scalability testing
Document expected RTF range for regression detection

Recommendation

Ready to merge. Solid CI enhancement.

Reviewed by OpenClaw with vllm-omni-skills 🦐

hsliuustc0106 · 2026-03-06T06:22:01Z

+# Stage 1: Code2Wav (codec codes -> audio waveform)
+#
+# The following config has been verified on 1x H100-80G GPU.
+async_chunk: true


do we use async_chunk for default? do we need to change the yaml name?

hsliuustc0106 · 2026-03-06T06:50:22Z

@yenuo26 @congw729 PTAL

yenuo26 · 2026-03-06T06:58:06Z

+        "server_params": {
+            "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
+            "stage_config_name": "qwen3_tts.yaml",
+            "backend": "openai-audio-speech",


why backend and endpoint add in server_params? i think these belong to benchmark_params

yenuo26 · 2026-03-06T07:02:28Z

            mapping[test_name] = {
                "test_name": test_name,
                "benchmark_params": [],
+                "backend": config["server_params"].get("backend", "openai-chat-omni"),


if you move backend and endpoint to benchmark_params， I think these need to be deleted synchronously.

yenuo26 · 2026-03-06T07:05:39Z

+    flow,
+    dataset_name: str,
+    num_prompt,
+    backend: str = "openai-chat-omni",


if you move backend and endpoint to benchmark_params， I think these can be included in the args.

congw729 · 2026-03-06T08:20:09Z

Can you provide the example benchmark results for your testing? And the spending time.

linyueqian · 2026-03-06T18:38:33Z

Thanks for the reviews!

For the async_chunk question, async_chunk: true is the only supported mode for Qwen3-TTS. All TTS configs use it, unlike Qwen3-Omni which has both async and non-async variants. So qwen3_tts.yaml is the correct name, no rename needed.

For the backend/endpoint placement, good catch. Moved them into benchmark_params and removed the special-casing in run_benchmark.py. They now flow through the generic params-to-args loop like everything else.

Will post example benchmark results shortly.

linyueqian · 2026-03-06T21:45:24Z

Ran benchmarks on H200 with latest main (includes #1583 initial_codec_chunk_frames and #1617 CUDA graph decoder). Config: initial_codec_chunk_frames=2, streaming enabled.

Concurrency 1 (10 prompts, random input_len=100)

Metric	Value
Mean TTFP	131ms
Median TTFP	126ms
P99 TTFP	179ms
Mean RTF	0.34
Audio throughput	2.95x real-time

Concurrency 4 (10 prompts, random input_len=100)

Metric	Value
Mean TTFP	3386ms
Median TTFP	200ms
Mean RTF	0.49
Audio throughput	7.38x real-time

Updated the benchmark client to use streaming (stream=true, response_format=pcm) so TTFP measures time to first audio chunk rather than full response latency. Also set initial_codec_chunk_frames=2 in the TTS yaml for faster first-chunk delivery.

linyueqian · 2026-03-06T21:48:36Z

@Sy0307 please help check if this config is correct. I think the RTF looks good for now.

Sy0307 · 2026-03-08T13:10:35Z

-            "--backend",
-            "openai-chat-omni",
-            "--endpoint",
-            "/v1/chat/completions",


Check possible risks for omni series test.

Sy0307 · 2026-03-08T13:12:06Z

@Sy0307 please help check if this config is correct. I think the RTF looks good for now.

Config is totally correct for me. Notice that modifications of default backend may take some errors. Plz check it.

JuanPZuluaga · 2026-03-09T06:51:36Z

hi @linyueqian, i'm working in a simple dynamic initial chunk size computation for qwen3TTS based on the load of code2wav, for low TTFC/TTFA. My question is:

initial_codec_chunk_frames is going to be computed dynamically, only when we make a "stream=True" request to the server, meaning that if we don't make a streaming request, there's no point in lowering the TTFA. Also, it would be removed from the yaml file.

The PR is: #1714

Should we merge that PR first and adapt the config here? for current benchmarking, the request might need to explicitly pass stream=true and/or initial_codec_chunk_frames to some lower value. (I see already stream=true async_request_openai_audio_speech)

Signed-off-by: linyueqian <linyueqian@outlook.com>

Adapt to PR vllm-project#1714 which computes initial_codec_chunk_frames dynamically based on code2wav load. The static config entry is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

hsliuustc0106

lgtm

Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Megha Agarwal <agarwalmegha1308@gmail.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>

hsliuustc0106 approved these changes Mar 6, 2026

View reviewed changes

hsliuustc0106 reviewed Mar 6, 2026

View reviewed changes

yenuo26 reviewed Mar 6, 2026

View reviewed changes

linyueqian force-pushed the worktree-tts-perf-benchmark branch from 387e5fa to 2ab1833 Compare March 6, 2026 21:45

linyueqian added the ready label to trigger buildkite CI label Mar 6, 2026

Sy0307 reviewed Mar 8, 2026

View reviewed changes

linyueqian mentioned this pull request Mar 10, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

linyueqian added 5 commits March 10, 2026 22:47

add Qwen3-TTS nightly performance benchmark test

fd920c8

Signed-off-by: linyueqian <linyueqian@outlook.com>

move backend/endpoint from server_params to benchmark_params

5fba6fb

Signed-off-by: linyueqian <linyueqian@outlook.com>

enable streaming benchmark and set initial_codec_chunk_frames=2

f7c3a0b

Signed-off-by: linyueqian <linyueqian@outlook.com>

Add explicit backend and endpoint to existing omni benchmark configs

669487c

Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian force-pushed the worktree-tts-perf-benchmark branch from ddfa93d to bc13271 Compare March 11, 2026 03:01

Merge branch 'main' into worktree-tts-perf-benchmark

6a4ca28

hsliuustc0106 approved these changes Mar 11, 2026

View reviewed changes

hsliuustc0106 merged commit f144f5e into vllm-project:main Mar 11, 2026
6 of 7 checks passed

meghaagr13 pushed a commit to meghaagr13/vllm-omni that referenced this pull request Mar 12, 2026

[Test] Add Qwen3-TTS nightly performance benchmark (vllm-project#1700)

42dfcc8

Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Megha Agarwal <agarwalmegha1308@gmail.com>

meghaagr13 pushed a commit to meghaagr13/vllm-omni that referenced this pull request Mar 12, 2026

[Test] Add Qwen3-TTS nightly performance benchmark (vllm-project#1700)

68631ba

Signed-off-by: linyueqian <linyueqian@outlook.com>

yenuo26 mentioned this pull request Mar 12, 2026

[RFC]: Supplement use cases for L1, L3, and L4 JiusiServe/vllm-omni#163

Closed

1 task

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Test] Add Qwen3-TTS nightly performance benchmark (vllm-project#1700)

98cde64

Signed-off-by: linyueqian <linyueqian@outlook.com>

Conversation

linyueqian commented Mar 6, 2026

Summary

Test Plan

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review

Summary

Highlights

Minor Suggestions

Recommendation

Uh oh!

hsliuustc0106 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 6, 2026

Uh oh!

yenuo26 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

congw729 commented Mar 6, 2026

Uh oh!

linyueqian commented Mar 6, 2026

Uh oh!

linyueqian commented Mar 6, 2026

Uh oh!

linyueqian commented Mar 6, 2026

Uh oh!

Sy0307 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Sy0307 commented Mar 8, 2026

Uh oh!

JuanPZuluaga commented Mar 9, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants