Skip to content

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo#1857

Merged
Gaohan123 merged 8 commits into
vllm-project:mainfrom
linyueqian:feat/tts-gradio-ws-transport
Mar 24, 2026
Merged

[Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo#1857
Gaohan123 merged 8 commits into
vllm-project:mainfrom
linyueqian:feat/tts-gradio-ws-transport

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

Summary

  • Add WebSocket streaming transport option to the FastRTC Gradio demo, using the /v1/audio/speech/stream endpoint from [Feat][Qwen3-TTS] Support streaming audio output for websocket  #1719
  • Add live streaming stats display (TTFP, chunks, audio duration, RTF, throughput) with polling timer
  • Add build_ws_config() and stream_pcm_chunks_ws() helpers to tts_common.py for reuse across demos
  • Fix fetch_voices() returning empty list when server returns {"voices": []}

Test plan

  • Start vLLM server with Qwen3-TTS model
  • Run python gradio_fastrtc_demo.py --api-base http://localhost:8000
  • Test HTTP transport: enter text, click Generate, verify audio plays and stats update
  • Test WebSocket transport: switch to WebSocket, generate, verify audio plays
  • Test all three task types (CustomVoice, VoiceDesign, Base)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29be07e26b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +274 to +276
usable = len(message) - (len(message) % 2)
if usable > 0:
yield np.frombuffer(message[:usable], dtype=np.int16).copy()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve PCM byte alignment across WebSocket frames

This drops trailing odd bytes on each WebSocket frame instead of carrying them into the next frame, so PCM samples are lost whenever the server splits audio on odd byte boundaries. The streaming endpoint explicitly allows chunk boundaries that are not sample-aligned (the WebSocket tests include 3-byte and 1-byte frames), so this path can deterministically corrupt/shorten audio in WebSocket transport; the function should buffer leftover bytes between frames like stream_pcm_chunks() does for HTTP chunks.

Useful? React with 👍 / 👎.

@linyueqian linyueqian changed the title [Feat][Qwen3-TTS] Add WebSocket transport and streaming stats to FastRTC demo [Feat][Qwen3-TTS] Better Qwen3-TTS online serving demo Mar 18, 2026
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
…TS playback

Gradio's built-in gr.Audio(streaming=True) plays each yielded chunk as a
separate audio blob, causing audible gaps between chunks. This replaces it
with a custom Web Audio API AudioWorklet player (inspired by
KoljaB/RealtimeVoiceChat) that maintains a FIFO buffer queue and plays
samples at the audio clock rate — eliminating inter-chunk gaps entirely.

Key changes:
- AudioWorklet-based player with FIFO queue for gap-free streaming
- Same-origin FastAPI proxy endpoint (/proxy/v1/audio/speech) to avoid CORS
- Browser-side fetch() with ReadableStream feeds PCM directly to worklet
- Live streaming stats dashboard: TTFP, RTF, speed, audio duration
- RTF bar with color-coded speed indicator (green/amber/red)
- RTF frozen at stream-end time (fixes 1.0x bug from playback wait)
- vLLM blue theme (primary #4A90D9) across all Gradio components
- Non-streaming mode falls back to standard gr.Audio component
- HTTP streaming default (full text synthesis, no sentence splitting gaps)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
- Delete gradio_fastrtc_demo.py (superseded by AudioWorklet player in
  gradio_demo.py which provides gapless streaming without WebRTC overhead)
- Remove build_ws_config() and stream_pcm_chunks_ws() from tts_common.py
- Drop --enforce-eager from run_server.sh and run_gradio_demo.sh so the
  stage config controls CUDA graph usage (Stage 0 gets graphs, Stage 1 eager)
- Fix --ip → --host in run_gradio_demo.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian linyueqian force-pushed the feat/tts-gradio-ws-transport branch from ab988ce to 214239d Compare March 18, 2026 02:45
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
- Add per-task-type examples (CustomVoice/VoiceDesign/Base) that toggle
  with task selection and pre-fill relevant fields including ref audio URL
- Add Reset button that stops playback and clears inputs
- Remove player Ready state empty margin
- Use Gradio's native theme for consistent styling across all components
- Header with vLLM-Omni logo and "Served by" branding
- Lowercase "speaker" label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@Sy0307
Copy link
Copy Markdown
Contributor

Sy0307 commented Mar 18, 2026

Looks good! A few suggestions:

1.The "--share" parameter is no longer used in uvicorn, so we can consider removing it.
2.The current playback strategy works fine for stable network conditions. However, if it's being used as a service endpoint, we need to add things like a jitter buffer to ensure stable audio output. We can create a new PR to discuss and standardize this.
3.Custom voice upload hasn't been implemented yet.

- Pre-download ref_audio URL in proxy before forwarding to vLLM, so
  TTFP only measures synthesis time, not audio download latency
- Store large payloads (uploaded ref audio) server-side with request ID
  to avoid Gradio textbox truncating 1MB+ base64 strings
- Base examples use ref_audio_url (not file upload) for simplicity
- Abort previous fetch and double-clear worklet buffer on new generation
- Fix .then() JS event chain (removes Blocks.svelte map error)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 21, 2026
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Mar 24, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 merged commit 8b102bd into vllm-project:main Mar 24, 2026
6 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…1857)

Signed-off-by: linyueqian <linyueqian@outlook.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants