[Bugfix] Fix high TTFP for Base task in Gradio TTS demo by linyueqian · Pull Request #2116 · vllm-project/vllm-omni

linyueqian · 2026-03-24T05:48:35Z

Summary

Fix high TTFP (time-to-first-playback) for Base voice clone task in the Gradio TTS demo.

The proxy was pre-downloading ref_audio URLs and re-encoding as base64 before forwarding to the vLLM server, adding ~2-3s overhead. The vLLM server already handles URL resolution via _resolve_ref_audio, so the proxy can pass URLs directly.

Also moved the TTFP timer to start right before the fetch call, excluding UI setup time.

Test plan

Start server with Base model: vllm serve Qwen/Qwen3-TTS-12Hz-0.6B-Base --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml --omni --trust-remote-code
Open Gradio demo, select Base task, provide ref_audio URL
Before: TTFP ~3314ms
After: TTFP ~702ms
Raw API TTFP (no proxy): ~200-350ms

The Gradio proxy was pre-downloading ref_audio URLs and re-encoding them as base64 before forwarding to the vLLM server, adding ~2-3s to TTFP. The vLLM server already handles URL resolution via _resolve_ref_audio, so the proxy can pass URLs directly. Also move the TTFP timer start to right before the fetch call so it excludes Gradio UI setup and payload building time. Before: TTFP ~3314ms (Base task with URL ref_audio) After: TTFP ~702ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian · 2026-03-24T05:49:28Z

@Gaohan123 since #1857 is merged. i have pateched a fix for the base task in gradio demo.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9da6ca3675

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T05:51:21Z

+        # Pass ref_audio URL directly to vLLM server (it handles URL resolution).
+        # Pre-downloading and re-encoding adds ~2-3s to TTFP for large files.


Keep proxy-side fallback for unreachable reference URLs

Passing ref_audio URLs straight through in proxy_speech removes the previous behavior where the proxy could fetch and inline audio as a data URI, which means Base cloning now fails whenever the Gradio host can access the URL but the upstream --api-base server cannot (e.g., restricted egress or different network zone). This is a functional regression from the prior implementation for split-network deployments, so the proxy should retain a fallback path instead of always delegating URL fetches upstream.

Useful? React with 👍 / 👎.

…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner March 24, 2026 05:48

linyueqian added the ready label to trigger buildkite CI label Mar 24, 2026

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into fix/gradio-ttfp-base-task

b1b00dc

hsliuustc0106 merged commit 7f11204 into vllm-project:main Mar 24, 2026
3 of 6 checks passed

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo (vllm-project…

0ff5964

…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo (vllm-project…

a21f152

…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo (vllm-project…

aef7af1

…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo (vllm-project…

b5e0b78

…#2116) Signed-off-by: linyueqian <linyueqian@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo#2116

[Bugfix] Fix high TTFP for Base task in Gradio TTS demo#2116
hsliuustc0106 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/gradio-ttfp-base-task

linyueqian commented Mar 24, 2026 •

edited

Loading

Uh oh!

linyueqian commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Pass ref_audio URL directly to vLLM server (it handles URL resolution).
		# Pre-downloading and re-encoding adds ~2-3s to TTFP for large files.

Conversation

linyueqian commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

linyueqian commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linyueqian commented Mar 24, 2026 •

edited

Loading