Skip to content

[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146

Open
natureofnature wants to merge 11 commits into
vllm-project:mainfrom
natureofnature:pr/refactor/pr4
Open

[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146
natureofnature wants to merge 11 commits into
vllm-project:mainfrom
natureofnature:pr/refactor/pr4

Conversation

@natureofnature

@natureofnature natureofnature commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

For the Refactor of commnication layer, there are going to be 5~6 PRs in total.
4a: Qwen3 Omni only
4b: Other relevant models.
Refer to PR #1555 as the first PR.
PR:#2677 as the second PR.
PR:#3719 as the third PR.

Test Plan

test-ready, test-merge, test-nightly

Test Result

To be added.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Introduce the coordinator-side selection, carried registration field, chunk-ready/finished consumption, late-ready retention, and scheduler completion guards needed before enabling async-chunk stages.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Add runner-side async-chunk sends, duplicate-send guards, finish sentinel enqueue/consume paths, code2wav terminal flushing, deep metadata merge semantics, and quiet per-chunk transport logging.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Flip qwen3_omni async-chunk stages onto the coordinator+mixin transport, preserve audio finish behavior, and keep allowlist-active branches safe for partial unit-test mocks.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Carry segment-finished state across the legacy thinker to coordinator talker/code2wav boundary with a distinct connector signal so realtime async-chunk requests flush and finish correctly.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Keep prefill and decode rows separate across the streaming handoff, then consume cached decode rows as the talker decode prefix before current decode rows.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106 hsliuustc0106 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted one inline blocker found during review.

If the request finished on a chunk boundary (no unsent tail), emit a
finish-only flag instead of re-sending the last full chunk.
"""
chunk_size_config, left_context_size_config = _code2wav_codec_config(transfer_manager)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finish-sentinel path does not mirror the live send path when initial_codec_chunk_frames is configured. The bundled Qwen3 Omni configs set initial_codec_chunk_frames: 4, so a request that ends exactly at 4 frames, or at 4 + N * codec_chunk_frames, has already emitted the full boundary chunk in the normal path. Here length % codec_chunk_frames is non-zero, so the sentinel re-sends those frames as a tail, duplicating audio at the end. Can we include the same initial-chunk adjustment used below before deciding whether there is an unsent tail?

@hsliuustc0106 hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels Jun 11, 2026
@hsliuustc0106

Copy link
Copy Markdown
Collaborator

is there any benefit of this PR in terms of perf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-test label to trigger buildkite merge test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants