[Test] Add Qwen3-TTS nightly performance benchmark#1700
Conversation
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review
Rating: 8.5/10 | Verdict: ✅ Approved
Summary
Good addition of Qwen3-TTS performance benchmark to nightly CI pipeline. Follows established patterns from Qwen3-Omni benchmark.
Highlights
- ✅ Makes backend/endpoint configurable (was hardcoded)
- ✅ Adds TTS stage config with concurrency [1, 4]
- ✅ Validated locally on H200
- ✅ RTF ~0.94 demonstrates good performance
Minor Suggestions
- Consider adding more concurrency levels (e.g., [1, 4, 8]) for scalability testing
- Document expected RTF range for regression detection
Recommendation
Ready to merge. Solid CI enhancement.
Reviewed by OpenClaw with vllm-omni-skills 🦐
| # Stage 1: Code2Wav (codec codes -> audio waveform) | ||
| # | ||
| # The following config has been verified on 1x H100-80G GPU. | ||
| async_chunk: true |
There was a problem hiding this comment.
do we use async_chunk for default? do we need to change the yaml name?
| "server_params": { | ||
| "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice", | ||
| "stage_config_name": "qwen3_tts.yaml", | ||
| "backend": "openai-audio-speech", |
There was a problem hiding this comment.
why backend and endpoint add in server_params? i think these belong to benchmark_params
| mapping[test_name] = { | ||
| "test_name": test_name, | ||
| "benchmark_params": [], | ||
| "backend": config["server_params"].get("backend", "openai-chat-omni"), |
There was a problem hiding this comment.
if you move backend and endpoint to benchmark_params, I think these need to be deleted synchronously.
| flow, | ||
| dataset_name: str, | ||
| num_prompt, | ||
| backend: str = "openai-chat-omni", |
There was a problem hiding this comment.
if you move backend and endpoint to benchmark_params, I think these can be included in the args.
|
Can you provide the example benchmark results for your testing? And the spending time. |
|
Thanks for the reviews! For the async_chunk question, For the backend/endpoint placement, good catch. Moved them into Will post example benchmark results shortly. |
387e5fa to
2ab1833
Compare
|
Ran benchmarks on H200 with latest main (includes #1583 initial_codec_chunk_frames and #1617 CUDA graph decoder). Config: Concurrency 1 (10 prompts, random input_len=100)
Concurrency 4 (10 prompts, random input_len=100)
Updated the benchmark client to use streaming ( |
|
@Sy0307 please help check if this config is correct. I think the RTF looks good for now. |
| "--backend", | ||
| "openai-chat-omni", | ||
| "--endpoint", | ||
| "/v1/chat/completions", |
There was a problem hiding this comment.
Check possible risks for omni series test.
Config is totally correct for me. Notice that modifications of default backend may take some errors. Plz check it. |
|
hi @linyueqian, i'm working in a simple dynamic initial chunk size computation for qwen3TTS based on the load of code2wav, for low TTFC/TTFA. My question is:
The PR is: #1714 Should we merge that PR first and adapt the config here? for current benchmarking, the request might need to explicitly pass stream=true and/or |
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Adapt to PR vllm-project#1714 which computes initial_codec_chunk_frames dynamically based on code2wav load. The static config entry is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>
ddfa93d to
bc13271
Compare
Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Megha Agarwal <agarwalmegha1308@gmail.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Summary
Test Plan
vllm bench serve --omni --backend openai-audio-speechwith 3 prompts, concurrency 1