Skip to content

[Frontend][Model][Qwen3-Omni] Enable realtime async-chunk commit bridge#3654

Open
indevn wants to merge 1 commit into
vllm-project:mainfrom
indevn:feat/qwen3-realtime-asyncchunk-bridge
Open

[Frontend][Model][Qwen3-Omni] Enable realtime async-chunk commit bridge#3654
indevn wants to merge 1 commit into
vllm-project:mainfrom
indevn:feat/qwen3-realtime-asyncchunk-bridge

Conversation

@indevn
Copy link
Copy Markdown
Contributor

@indevn indevn commented May 16, 2026

Purpose

Enable the OpenAI-compatible /v1/realtime WebSocket path for Qwen3-Omni when async_chunk is enabled.

Before this PR, the API server rejected realtime sessions whenever engine_client.async_chunk was true, even though Qwen3-Omni's default deployment path uses async chunking. This PR removes that hard guard and adds a realtime async-chunk bridge:

  • async_chunk: false keeps the existing realtime streaming-input path.
  • async_chunk: true buffers input_audio_buffer.append audio, ignores non-final commits, and starts one normal multimodal Qwen3-Omni request after input_audio_buffer.commit with final: true.
  • The bridge reuses the existing Thinker -> Talker -> Code2Wav async-chunk pipeline and maps outputs back to transcription.* and response.audio.* realtime events.

This is intentionally a commit-then-generate compatibility bridge. It does not implement early-start streaming input or prompt extension for async chunking; that broader scope is being explored separately in #3614.

Test Plan

pytest tests/entrypoints/test_realtime_connection_helpers.py -q

pytest tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[async_chunk] \
  --run-level advanced_model -q -s

pytest tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[no_async_chunk] \
  --run-level advanced_model -q -s

ruff check \
  tests/entrypoints/test_realtime_connection_helpers.py \
  tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py \
  vllm_omni/entrypoints/openai/realtime_connection.py \
  vllm_omni/entrypoints/openai/api_server.py \
  vllm_omni/entrypoints/openai/__init__.py \
  vllm_omni/entrypoints/async_omni.py \
  vllm_omni/entrypoints/streaming_input.py

git diff --check upstream/main...HEAD

Test Result

tests/entrypoints/test_realtime_connection_helpers.py
20 passed

tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[async_chunk]
1 passed

tests/entrypoints/openai_api/test_qwen3_omni_realtime_websocket.py::TestQwen3OmniRealtimeWebSocket::test_streaming_audio_input_pcm_output[no_async_chunk]
1 passed

ruff check
passed

git diff --check upstream/main...HEAD
passed

The realtime e2e covers both the new async-chunk bridge and the existing --no-async-chunk realtime path.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

Signed-off-by: indevn <indevn@outlook.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82a8792097

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +303 to +305
if new_tokens_len:
input_stream.put_nowait(new_token_ids)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop queueing tokens when async bridge has no consumer

In the async-chunk bridge path, _run_async_chunk_bridge_generation creates a fresh input_stream queue but never passes it to any consumer, while _consume_generation_outputs still enqueues new_token_ids on every chunk. That means each realtime request accumulates all generated token-id lists in memory until completion, so long responses (or many concurrent sessions) can cause avoidable memory growth and eventually OOM pressure. Guard this enqueue behind a consumer check, or skip it for the async-chunk bridge path.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@Shirley125 PTAL

@Shirley125
Copy link
Copy Markdown
Contributor

Shirley125 commented May 18, 2026

@indevn Hi, thanks for the PR. From what I understand, this PR adds temporary compatibility support for the Realtime API when async chunking is enabled.

This PR #3614 is planned to be merged in the 0.22 release and is intended to provide full Realtime API support under async chunk mode. Because of that, the necessity of add an intermediate compatibility bridge here may be somewhat limited.

Perhaps we could instead discuss and collaborate on other optimizations around audio streaming input:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants