Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/user_guide/examples/online_serving/text_to_speech.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,16 @@ Stage configs ship with the chunked-streaming default. To use the uniproc execut

To opt out of chunked streaming, pass `--no-async-chunk` instead — the pipeline auto-dispatches to the end-to-end codec processor.

### Tuning stage 1 `max_num_seqs` per task type
The bundled `qwen3_tts.yaml` ships stage 1 (Code2Wav) at `max_num_seqs: 10`, tuned for Base voice cloning: stage-1 lifetimes are long (~3 s/req), so admitting up to 10 concurrent codec sequences lets requests progress in parallel in the scheduler — ~2× TTFA p95 at c=4 / c=8 (1× H100, 1.7B-Base, seed-tts) at an 8–12 % audio-throughput cost.

CustomVoice / VoiceDesign have much shorter stage-1 lifetimes (~50–200 ms) and are TTFA-optimal at `max_num_seqs: 1`. Override the default when serving those task types:

```bash
vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-Base --omni \
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_num_seqs: 10 Tuned for Base voice clone; CustomVoice / VoiceDesign are TTFA-optimal at 1.

should the model be custom_voice?

--stage-overrides '{"1": {"max_num_seqs": 1}}'
```

### Sending requests
```bash
# CustomVoice with a predefined speaker
Expand Down
1 change: 1 addition & 0 deletions vllm_omni/deploy/qwen3_tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ stages:
top_p: 1.0

- stage_id: 1
# Tuned for Base voice clone; CustomVoice / VoiceDesign are TTFA-optimal at 1.
max_num_seqs: 10
gpu_memory_utilization: 0.3
enforce_eager: true
Expand Down
Loading