[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode by natureofnature · Pull Request #4146 · vllm-project/vllm-omni

natureofnature · 2026-06-04T10:41:06Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

For the Refactor of commnication layer, there are going to be 5~6 PRs in total.
4a: Qwen3 Omni only
4b: Other relevant models.
Refer to PR #1555 as the first PR.
PR:#2677 as the second PR.
PR:#3719 as the third PR.

Test Plan

test-ready, test-merge, test-nightly

Test Result

To be added.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Introduce the coordinator-side selection, carried registration field, chunk-ready/finished consumption, late-ready retention, and scheduler completion guards needed before enabling async-chunk stages. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Add runner-side async-chunk sends, duplicate-send guards, finish sentinel enqueue/consume paths, code2wav terminal flushing, deep metadata merge semantics, and quiet per-chunk transport logging. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Flip qwen3_omni async-chunk stages onto the coordinator+mixin transport, preserve audio finish behavior, and keep allowlist-active branches safe for partial unit-test mocks. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Carry segment-finished state across the legacy thinker to coordinator talker/code2wav boundary with a distinct connector signal so realtime async-chunk requests flush and finish correctly. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Keep prefill and decode rows separate across the streaming handoff, then consume cached decode rows as the talker decode prefix before current decode rows. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

chatgpt-codex-connector · 2026-06-11T08:59:36Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106

Posted one inline blocker found during review.

hsliuustc0106 · 2026-06-11T10:33:10Z

+    If the request finished on a chunk boundary (no unsent tail), emit a
+    finish-only flag instead of re-sending the last full chunk.
+    """
+    chunk_size_config, left_context_size_config = _code2wav_codec_config(transfer_manager)


This finish-sentinel path does not mirror the live send path when initial_codec_chunk_frames is configured. The bundled Qwen3 Omni configs set initial_codec_chunk_frames: 4, so a request that ends exactly at 4 frames, or at 4 + N * codec_chunk_frames, has already emitted the full boundary chunk in the normal path. Here length % codec_chunk_frames is non-zero, so the sentinel re-sends those frames as a tail, duplicating audio at the end. Can we include the same initial-chunk adjustment used below before deciding whether there is an unsent tail?

hsliuustc0106 · 2026-06-11T11:11:40Z

is there any benefit of this PR in terms of perf?

natureofnature mentioned this pull request Jun 8, 2026

[RFC]: Refactor Communication Layer: Async Chunk JiusiServe/vllm-omni#255

Open

1 task

natureofnature force-pushed the pr/refactor/pr4 branch from 58d906a to 3d97797 Compare June 8, 2026 10:01

natureofnature added 5 commits June 10, 2026 09:06

[PR4] enable qwen3 omni async-chunk coordinator path

a45282c

Flip qwen3_omni async-chunk stages onto the coordinator+mixin transport, preserve audio finish behavior, and keep allowlist-active branches safe for partial unit-test mocks. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

[PR4] fix qwen3 async-chunk decode handoff alignment

4d46b5a

Keep prefill and decode rows separate across the streaming handoff, then consume cached decode rows as the talker decode prefix before current decode rows. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature force-pushed the pr/refactor/pr4 branch from e7bb493 to 4d46b5a Compare June 10, 2026 10:09

natureofnature added 5 commits June 10, 2026 10:28

[PR4] preserve terminal finished meta in async cache

e3e957b

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Format Qwen3 Omni decode handoff

2bf11d5

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Clean async chunk comments

53e77b0

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Clean async chunk segment send state

9d920c2

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Move segment cleanup to enqueue time

088cfba

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature marked this pull request as ready for review June 11, 2026 08:59

natureofnature requested review from Gaohan123, ZeldaHuang, gcanlin, hsliuustc0106, linyueqian, princepride, tzhouam, yenuo26, yuanheng-zhao and ywang96 as code owners June 11, 2026 08:59

hsliuustc0106 reviewed Jun 11, 2026

View reviewed changes

Merge branch 'main' into pr/refactor/pr4

4f7eb3c

hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146

[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146
natureofnature wants to merge 11 commits into
vllm-project:mainfrom
natureofnature:pr/refactor/pr4

natureofnature commented Jun 4, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 Jun 11, 2026

Uh oh!

hsliuustc0106 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

natureofnature commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

natureofnature commented Jun 4, 2026 •

edited

Loading