[Bugfix] Fix high TTFP for Base task in Gradio TTS demo#2116
Conversation
The Gradio proxy was pre-downloading ref_audio URLs and re-encoding them as base64 before forwarding to the vLLM server, adding ~2-3s to TTFP. The vLLM server already handles URL resolution via _resolve_ref_audio, so the proxy can pass URLs directly. Also move the TTFP timer start to right before the fetch call so it excludes Gradio UI setup and payload building time. Before: TTFP ~3314ms (Base task with URL ref_audio) After: TTFP ~702ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>
|
@Gaohan123 since #1857 is merged. i have pateched a fix for the base task in gradio demo. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9da6ca3675
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Pass ref_audio URL directly to vLLM server (it handles URL resolution). | ||
| # Pre-downloading and re-encoding adds ~2-3s to TTFP for large files. |
There was a problem hiding this comment.
Keep proxy-side fallback for unreachable reference URLs
Passing ref_audio URLs straight through in proxy_speech removes the previous behavior where the proxy could fetch and inline audio as a data URI, which means Base cloning now fails whenever the Gradio host can access the URL but the upstream --api-base server cannot (e.g., restricted egress or different network zone). This is a functional regression from the prior implementation for split-network deployments, so the proxy should retain a fallback path instead of always delegating URL fetches upstream.
Useful? React with 👍 / 👎.
…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>
…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>
…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>
…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>
Summary
Fix high TTFP (time-to-first-playback) for Base voice clone task in the Gradio TTS demo.
The proxy was pre-downloading ref_audio URLs and re-encoding as base64 before forwarding to the vLLM server, adding ~2-3s overhead. The vLLM server already handles URL resolution via
_resolve_ref_audio, so the proxy can pass URLs directly.Also moved the TTFP timer to start right before the fetch call, excluding UI setup time.
Test plan
vllm serve Qwen/Qwen3-TTS-12Hz-0.6B-Base --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml --omni --trust-remote-code