Skip to content

[CI][Perf] Add high-load stress phase for Qwen3-TTS daily perf#3238

Merged
linyueqian merged 1 commit into
vllm-project:mainfrom
linyueqian:feat/bench-dfx-tts-c10-n100
May 2, 2026
Merged

[CI][Perf] Add high-load stress phase for Qwen3-TTS daily perf#3238
linyueqian merged 1 commit into
vllm-project:mainfrom
linyueqian:feat/bench-dfx-tts-c10-n100

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

Summary

Daily DFX TTS perf currently caps at max_concurrency=8 in the throughput regime, so high-load TTFA tail regressions (e.g. the cross-request Code2Wav batching gap that #3163 proposes to fix and that #3221's Triton stack already demonstrates a fix for) are invisible to nightly CI.

This PR adds a stress phase to test_qwen3_tts_customvoice for both default_voice and voice_design, mirroring the open-loop pattern already used by test_qwen_omni.json:

  • num_prompts: [100], request_rate: [2.0] (open-loop, ~2 req/s offered for ~50 s of wall time)
  • Same prompt source as the existing throughput phase, so no new dataset deps
  • Baselines intentionally loose (median TTFA 3.0–3.5 s, median RTF 0.25–0.30, audio_throughput floor 4.0 audio-s/wall-s) so it alarms only on real regressions and can be tightened once we have a few nightly runs

eval_phase: \"stress\" is metadata only — run_benchmark.py already lists eval_phase in exclude_keys, so no script change is needed.

Why this matters

On H20 (single H20-3e), the gap between current main and a stack with cross-request codec batching shows up exactly in this load region:

Concurrency main req/s main TTFA p95 PR #3221 req/s PR #3221 TTFA p95
8 2.29 386 ms 4.41 397 ms
16 2.44 4437 ms 4.52 258 ms
32 2.47 8626 ms 6.73 463 ms

main saturates at ~2.47 req/s starting at c=8, so any offered load above that queues hard. The new entry sits at offered rate=2.0 (~80 % of main's sustainable rate, ~30 % of #3221's) — that's the regime where regressions in scheduler / codec batching are loudest.

Test plan

cc @ischencheng (re #3163), @vklimkov-nvidia (re #3221), @hsliuustc0106

Daily TTS perf CI currently caps at max_concurrency=8 in the
throughput regime, so high-load TTFA tail regressions (e.g. the
Code2Wav cross-request batching gap discussed in vllm-project#3163 / shown by
vllm-project#3221) are invisible to nightly. This adds a stress phase mirroring
the open-loop pattern already used by test_qwen_omni.json: 100
requests at request_rate=2.0 for both default_voice and voice_design.

Baselines are intentionally loose (median TTFA 3.0-3.5 s,
median RTF 0.25-0.30, audio_throughput floor 4.0 audio-s/wall-s) so
the entry alarms only on real regressions and can be tightened in a
follow-up once we have a few nightly runs.

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Validated:

  • All gates pass (DCO, pre-commit, build 3.11/3.12, docs)
  • eval_phase is in exclude_keys at run_benchmark.py:336 — no script change needed
  • New entries mirror the existing throughput/latency structure exactly (same percentile-metrics, dataset_path, backend, consistent per-task config)
  • Baselines are intentionally loose, documented as such, with a plan to tighten after nightly data
  • Benchmark evidence in the PR body is thorough: concrete concurrency/throughput/TTFA table showing exactly the gap this stress phase will catch

No blocking issues. The change is well-scoped (config-only, +36 lines), well-justified, and well-evidenced. LGTM.

@linyueqian linyueqian enabled auto-merge (squash) May 1, 2026 18:14
@linyueqian linyueqian added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 2, 2026
@linyueqian linyueqian merged commit 5e82d7f into vllm-project:main May 2, 2026
7 of 8 checks passed
sphinxkkkbc pushed a commit to sphinxkkkbc/vllm-omni that referenced this pull request May 4, 2026
…project#3238)

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-test label to trigger buildkite merge test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants