[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo by linyueqian · Pull Request #1857 · vllm-project/vllm-omni

linyueqian · 2026-03-12T17:05:11Z

Summary

Add WebSocket streaming transport option to the FastRTC Gradio demo, using the /v1/audio/speech/stream endpoint from [Feat][Qwen3-TTS] Support streaming audio output for websocket #1719
Add live streaming stats display (TTFP, chunks, audio duration, RTF, throughput) with polling timer
Add build_ws_config() and stream_pcm_chunks_ws() helpers to tts_common.py for reuse across demos
Fix fetch_voices() returning empty list when server returns {"voices": []}

Test plan

Start vLLM server with Qwen3-TTS model
Run python gradio_fastrtc_demo.py --api-base http://localhost:8000
Test HTTP transport: enter text, click Generate, verify audio plays and stats update
Test WebSocket transport: switch to WebSocket, generate, verify audio plays
Test all three task types (CustomVoice, VoiceDesign, Base)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29be07e26b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-12T17:09:32Z

+                usable = len(message) - (len(message) % 2)
+                if usable > 0:
+                    yield np.frombuffer(message[:usable], dtype=np.int16).copy()


Preserve PCM byte alignment across WebSocket frames

This drops trailing odd bytes on each WebSocket frame instead of carrying them into the next frame, so PCM samples are lost whenever the server splits audio on odd byte boundaries. The streaming endpoint explicitly allows chunk boundaries that are not sample-aligned (the WebSocket tests include 3-byte and 1-byte frames), so this path can deterministically corrupt/shorten audio in WebSocket transport; the function should buffer leftover bytes between frames like stream_pcm_chunks() does for HTTP chunks.

Useful? React with 👍 / 👎.

Signed-off-by: linyueqian <linyueqian@outlook.com>

…TS playback Gradio's built-in gr.Audio(streaming=True) plays each yielded chunk as a separate audio blob, causing audible gaps between chunks. This replaces it with a custom Web Audio API AudioWorklet player (inspired by KoljaB/RealtimeVoiceChat) that maintains a FIFO buffer queue and plays samples at the audio clock rate — eliminating inter-chunk gaps entirely. Key changes: - AudioWorklet-based player with FIFO queue for gap-free streaming - Same-origin FastAPI proxy endpoint (/proxy/v1/audio/speech) to avoid CORS - Browser-side fetch() with ReadableStream feeds PCM directly to worklet - Live streaming stats dashboard: TTFP, RTF, speed, audio duration - RTF bar with color-coded speed indicator (green/amber/red) - RTF frozen at stream-end time (fixes 1.0x bug from playback wait) - vLLM blue theme (primary #4A90D9) across all Gradio components - Non-streaming mode falls back to standard gr.Audio component - HTTP streaming default (full text synthesis, no sentence splitting gaps) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

- Delete gradio_fastrtc_demo.py (superseded by AudioWorklet player in gradio_demo.py which provides gapless streaming without WebRTC overhead) - Remove build_ws_config() and stream_pcm_chunks_ws() from tts_common.py - Drop --enforce-eager from run_server.sh and run_gradio_demo.sh so the stage config controls CUDA graph usage (Stage 0 gets graphs, Stage 1 eager) - Fix --ip → --host in run_gradio_demo.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

- Add per-task-type examples (CustomVoice/VoiceDesign/Base) that toggle with task selection and pre-fill relevant fields including ref audio URL - Add Reset button that stops playback and clears inputs - Remove player Ready state empty margin - Use Gradio's native theme for consistent styling across all components - Header with vLLM-Omni logo and "Served by" branding - Lowercase "speaker" label Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Sy0307 · 2026-03-18T03:58:31Z

Looks good! A few suggestions:

1.The "--share" parameter is no longer used in uvicorn, so we can consider removing it.
2.The current playback strategy works fine for stable network conditions. However, if it's being used as a service endpoint, we need to add things like a jitter buffer to ensure stable audio output. We can create a new PR to discuss and standardize this.
3.Custom voice upload hasn't been implemented yet.

- Pre-download ref_audio URL in proxy before forwarding to vLLM, so TTFP only measures synthesis time, not audio download latency - Store large payloads (uploaded ref audio) server-side with request ID to avoid Gradio textbox truncating 1MB+ base64 strings - Base examples use ref_audio_url (not file upload) for simplicity - Abort previous fetch and double-clear worklet buffer on new generation - Fix .then() JS event chain (removes Blocks.svelte map error) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Gaohan123

LGTM. Thanks

…1857) Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner March 12, 2026 17:05

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

linyueqian changed the title ~~[Feat][Qwen3-TTS] Add WebSocket transport and streaming stats to FastRTC demo~~ [Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo Mar 18, 2026

linyueqian added 4 commits March 17, 2026 22:44

Add WebSocket transport and live streaming stats to FastRTC TTS demo

e97d84f

Signed-off-by: linyueqian <linyueqian@outlook.com>

Fix WebSocket PCM streaming to preserve byte alignment across frames

30d32cd

Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian force-pushed the feat/tts-gradio-ws-transport branch from ab988ce to 214239d Compare March 18, 2026 02:45

linyueqian added 2 commits March 17, 2026 23:00

Add vLLM-Omni logo to Gradio demo right panel

9b91db4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian mentioned this pull request Mar 18, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

Gaohan123 added this to the v0.18.0 milestone Mar 21, 2026

Gaohan123 added the ready label to trigger buildkite CI label Mar 24, 2026

Merge branch 'main' into feat/tts-gradio-ws-transport

da5d507

Gaohan123 approved these changes Mar 24, 2026

View reviewed changes

Gaohan123 merged commit 8b102bd into vllm-project:main Mar 24, 2026
6 checks passed

linyueqian mentioned this pull request Mar 24, 2026

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo #2116

Merged

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo (vllm-project#…

4841b62

…1857) Signed-off-by: linyueqian <linyueqian@outlook.com>

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo (vllm-project#…

6811b8b

…1857) Signed-off-by: linyueqian <linyueqian@outlook.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo (vllm-project#…

0cf4fad

…1857) Signed-off-by: linyueqian <linyueqian@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo#1857

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo#1857
Gaohan123 merged 8 commits into
vllm-project:mainfrom
linyueqian:feat/tts-gradio-ws-transport

linyueqian commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 12, 2026

Uh oh!

Sy0307 commented Mar 18, 2026

Uh oh!

Gaohan123 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

linyueqian commented Mar 12, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Sy0307 commented Mar 18, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants