[Feature] Streaming text input for Qwen3-TTS by thrashingstate · Pull Request #1883 · vllm-project/vllm-omni

thrashingstate · 2026-03-13T18:36:56Z

Note: Much of this code was generated using Claude Code — a thorough review would be much appreciated.

Purpose

Add true streaming text input support for Qwen3-TTS via an UPDATE_REQUEST mechanism. Resolves #1766.

Text token IDs arrive incrementally (e.g. from an LLM), get embedded on GPU, and are injected into a running TTS generation with zero voice discontinuity. The model pauses when the text queue runs low and resumes when more tokens arrive.

Key changes:

Scheduler (omni_ar_scheduler.py): Queue/flush/drain logic for pending additional_information updates on running requests, with early-arrival buffering for requests not yet registered and pause/resume on text starvation
Model runner (gpu_model_runner.py, gpu_ar_model_runner.py): Append-mode merge for streaming_text_token_ids
Entrypoints (async_omni.py, omni_stage.py, api_server.py, serving_speech.py): Route update requests through the async engine to the scheduler; new /v1/audio/speech/stream WebSocket endpoint
Model (qwen3_tts_talker.py): Consume streaming text embeddings during generation
Patch (patch.py): EngineCoreRequestType.UPDATE enum addition

Test Plan

16 unit tests in tests/entrypoints/test_streaming_tts.py covering scheduler update routing, early buffer, flush, drain, pause/resume, external ID resolution, model runner merge semantics, output types, task types, async routing, and patch enum
6 e2e tests in tests/e2e/online_serving/test_qwen3_tts_streaming.py covering: all-text-in-initial, chunked streaming, slow delivery with pause/resume, sequential requests, audio-not-error regression, and non-streaming fallback

# Unit tests
pytest tests/entrypoints/test_streaming_tts.py -v

# E2E tests (requires L4 GPU)
pytest tests/e2e/online_serving/test_qwen3_tts_streaming.py -v

Test Result

Unit tests and e2e tests passed locally.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25fc012eb2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-13T18:47:54Z

+            except WebSocketDisconnect:
+                logger.info(f"[{_req_id}] Client disconnected")


Abort backend TTS request on WebSocket disconnect

When the WebSocket client disconnects, this branch only logs and exits, but it never aborts the in-flight request created by create_speech_streaming. In that scenario (e.g., mobile/network drop mid-audio), the engine can keep decoding audio with no consumer until it naturally stops, wasting GPU time and request capacity; add explicit request cancellation/abort in the disconnect path.

Useful? React with 👍 / 👎.

lishunyang12 · 2026-03-13T19:12:30Z

Hi @thrashingstate. There is already a PR making the same effort ##1230. Unfortunately, I closed it because I didn't have too much bandwidth testing this while cannot find too much benefit out of this for the current TTS model. It will be particularly useful for model that can accept speech input but tts model doesn't. If you think you can find a very through logic to prove its usefulness. Then, the community will continue this effort.

yenuo26 · 2026-03-16T08:36:02Z

@@ -0,0 +1,286 @@
+# SPDX-License-Identifier: Apache-2.0


This PR unifies the test case style for qwen-tts. Is it sufficient to cover the corresponding streaming scenarios?
#1911
testcase style and test level can refer to: https://github.com/vllm-project/vllm-omni/blob/main/docs/contributing/ci/CI_5levels.md

Shirley125 · 2026-03-18T08:05:14Z

+    def _maybe_resume_request(self, req_id: str) -> None:
+        """Resume a paused request if it was waiting for an update."""
+        req = self.requests.get(req_id)
+        if req is not None and req.status == RequestStatus.WAITING_FOR_CHUNK:


Can we use the RequestStatus.WAITING_FOR_STREAMING_REQ state in vLLM?

hsliuustc0106 · 2026-03-20T13:21:55Z

resolve. conflicts please

Gaohan123

Could you please provide an e2e use case and update the docs?

thrashingstate · 2026-03-23T05:16:09Z

I've been busy with other tasks. I'll try to find time this week to address the feedback.

linyueqian · 2026-03-30T04:12:08Z

@thrashingstate is there any updates? Thanks!

amy-why-3459 · 2026-03-31T06:12:59Z

@gcanlin PTAL

Sy0307 · 2026-04-21T16:42:35Z

I will take over this task for streaming text input for Qwen3 TTS. cc @linyueqian @amy-why-3459

thrashingstate added 2 commits March 13, 2026 11:09

feat: streaming text input for Qwen3-TTS

1bcf6b8

test: add streaming text input tests

25fc012

thrashingstate requested a review from hsliuustc0106 as a code owner March 13, 2026 18:36

chatgpt-codex-connector Bot reviewed Mar 13, 2026

View reviewed changes

yenuo26 reviewed Mar 16, 2026

View reviewed changes

linyueqian self-requested a review March 16, 2026 19:36

Shirley125 reviewed Mar 18, 2026

View reviewed changes

linyueqian mentioned this pull request Mar 18, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

Gaohan123 reviewed Mar 21, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 23, 2026

[WIP] Compare changes between PR #1719 and PR #1883 LJH-LBJ/vllm-omni#2

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Streaming text input for Qwen3-TTS#1883

[Feature] Streaming text input for Qwen3-TTS#1883
thrashingstate wants to merge 2 commits into
vllm-project:mainfrom
thrashingstate:feature/streaming-tts-input-v016

thrashingstate commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Uh oh!

lishunyang12 commented Mar 13, 2026 •

edited

Loading

Uh oh!

yenuo26 Mar 16, 2026

Uh oh!

Shirley125 Mar 18, 2026

Uh oh!

Gaohan123 Mar 21, 2026

Uh oh!

hsliuustc0106 commented Mar 20, 2026

Uh oh!

Gaohan123 left a comment

Uh oh!

thrashingstate commented Mar 23, 2026

Uh oh!

linyueqian commented Mar 30, 2026

Uh oh!

amy-why-3459 commented Mar 31, 2026

Uh oh!

Sy0307 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		except WebSocketDisconnect:
		logger.info(f"[{_req_id}] Client disconnected")

Conversation

thrashingstate commented Mar 13, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yenuo26 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Shirley125 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 20, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

thrashingstate commented Mar 23, 2026

Uh oh!

linyueqian commented Mar 30, 2026

Uh oh!

amy-why-3459 commented Mar 31, 2026

Uh oh!

Sy0307 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

lishunyang12 commented Mar 13, 2026 •

edited

Loading