Skip to content

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo#2116

Merged
hsliuustc0106 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/gradio-ttfp-base-task
Mar 24, 2026
Merged

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo#2116
hsliuustc0106 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/gradio-ttfp-base-task

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian commented Mar 24, 2026

Summary

Fix high TTFP (time-to-first-playback) for Base voice clone task in the Gradio TTS demo.

The proxy was pre-downloading ref_audio URLs and re-encoding as base64 before forwarding to the vLLM server, adding ~2-3s overhead. The vLLM server already handles URL resolution via _resolve_ref_audio, so the proxy can pass URLs directly.

Also moved the TTFP timer to start right before the fetch call, excluding UI setup time.

Test plan

  • Start server with Base model: vllm serve Qwen/Qwen3-TTS-12Hz-0.6B-Base --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml --omni --trust-remote-code
  • Open Gradio demo, select Base task, provide ref_audio URL
  • Before: TTFP ~3314ms
  • After: TTFP ~702ms
  • Raw API TTFP (no proxy): ~200-350ms

The Gradio proxy was pre-downloading ref_audio URLs and re-encoding
them as base64 before forwarding to the vLLM server, adding ~2-3s to
TTFP. The vLLM server already handles URL resolution via
_resolve_ref_audio, so the proxy can pass URLs directly.

Also move the TTFP timer start to right before the fetch call so it
excludes Gradio UI setup and payload building time.

Before: TTFP ~3314ms (Base task with URL ref_audio)
After:  TTFP ~702ms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian
Copy link
Copy Markdown
Collaborator Author

@Gaohan123 since #1857 is merged. i have pateched a fix for the base task in gradio demo.

@linyueqian linyueqian added the ready label to trigger buildkite CI label Mar 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9da6ca3675

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +366 to +367
# Pass ref_audio URL directly to vLLM server (it handles URL resolution).
# Pre-downloading and re-encoding adds ~2-3s to TTFP for large files.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep proxy-side fallback for unreachable reference URLs

Passing ref_audio URLs straight through in proxy_speech removes the previous behavior where the proxy could fetch and inline audio as a data URI, which means Base cloning now fails whenever the Gradio host can access the URL but the upstream --api-base server cannot (e.g., restricted egress or different network zone). This is a functional regression from the prior implementation for split-network deployments, so the proxy should retain a fallback path instead of always delegating URL fetches upstream.

Useful? React with 👍 / 👎.

@hsliuustc0106 hsliuustc0106 merged commit 7f11204 into vllm-project:main Mar 24, 2026
3 of 6 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…#2116)

Signed-off-by: linyueqian <linyueqian@outlook.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants