[Frontend][Model][Qwen3-Omni] Enable realtime async-chunk commit bridge by indevn · Pull Request #3654 · vllm-project/vllm-omni

indevn · 2026-05-16T03:07:58Z

Purpose

Enable the OpenAI-compatible /v1/realtime WebSocket path for Qwen3-Omni when async_chunk is enabled.

Before this PR, the API server rejected realtime sessions whenever engine_client.async_chunk was true, even though Qwen3-Omni's default deployment path uses async chunking. This PR removes that hard guard and adds a realtime async-chunk bridge:

async_chunk: false keeps the existing realtime streaming-input path.
async_chunk: true buffers input_audio_buffer.append audio, ignores non-final commits, and starts one normal multimodal Qwen3-Omni request after input_audio_buffer.commit with final: true.
The bridge reuses the existing Thinker -> Talker -> Code2Wav async-chunk pipeline and maps outputs back to transcription.* and response.audio.* realtime events.

This is intentionally a commit-then-generate compatibility bridge. It does not implement early-start streaming input or prompt extension for async chunking; that broader scope is being explored separately in #3614.

Test Plan

pytest tests/entrypoints/test_realtime_connection_helpers.py -q

pytest tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[async_chunk] \
  --run-level advanced_model -q -s

pytest tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[no_async_chunk] \
  --run-level advanced_model -q -s

ruff check \
  tests/entrypoints/test_realtime_connection_helpers.py \
  tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py \
  vllm_omni/entrypoints/openai/realtime_connection.py \
  vllm_omni/entrypoints/openai/api_server.py \
  vllm_omni/entrypoints/openai/__init__.py \
  vllm_omni/entrypoints/async_omni.py \
  vllm_omni/entrypoints/streaming_input.py

git diff --check upstream/main...HEAD

Test Result

tests/entrypoints/test_realtime_connection_helpers.py
20 passed

tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[async_chunk]
1 passed

tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[no_async_chunk]
1 passed

ruff check
passed

git diff --check upstream/main...HEAD
passed

The realtime e2e covers both the new async-chunk bridge and the existing --no-async-chunk realtime path.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

Signed-off-by: indevn <indevn@outlook.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82a8792097

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T03:13:12Z

+                if new_tokens_len:
+                    input_stream.put_nowait(new_token_ids)
+


Stop queueing tokens when async bridge has no consumer

In the async-chunk bridge path, _run_async_chunk_bridge_generation creates a fresh input_stream queue but never passes it to any consumer, while _consume_generation_outputs still enqueues new_token_ids on every chunk. That means each realtime request accumulates all generated token-id lists in memory until completion, so long responses (or many concurrent sessions) can cause avoidable memory growth and eventually OOM pressure. Guard this enqueue behind a consumer check, or skip it for the async-chunk bridge path.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-05-16T05:20:14Z

@Shirley125 PTAL

Shirley125 · 2026-05-18T08:11:31Z

@indevn Hi, thanks for the PR. From what I understand, this PR adds temporary compatibility support for the Realtime API when async chunking is enabled.

This PR #3614 is planned to be merged in the 0.22 release and is intended to provide full Realtime API support under async chunk mode. Because of that, the necessity of add an intermediate compatibility bridge here may be somewhat limited.

Perhaps we could instead discuss and collaborate on other optimizations around audio streaming input：）

Enable Qwen3-Omni realtime async-chunk bridge

82a8792

Signed-off-by: indevn <indevn@outlook.com>

indevn requested review from Gaohan123, hsliuustc0106, tzhouam, yenuo26 and ywang96 as code owners May 16, 2026 03:07

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

indevn mentioned this pull request May 16, 2026

[Feat]audio streaming input for async chunk #3614

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][Model][Qwen3-Omni] Enable realtime async-chunk commit bridge#3654

[Frontend][Model][Qwen3-Omni] Enable realtime async-chunk commit bridge#3654
indevn wants to merge 1 commit into
vllm-project:mainfrom
indevn:feat/qwen3-realtime-asyncchunk-bridge

indevn commented May 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

Shirley125 commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

indevn commented May 16, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

Shirley125 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shirley125 commented May 18, 2026 •

edited

Loading