[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146
[WIP][4a/5][core]refactor communication layer: PR 4a of 5 Qwen3 Omni in async mode #4146natureofnature wants to merge 11 commits into
Conversation
58d906a to
3d97797
Compare
Introduce the coordinator-side selection, carried registration field, chunk-ready/finished consumption, late-ready retention, and scheduler completion guards needed before enabling async-chunk stages. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Add runner-side async-chunk sends, duplicate-send guards, finish sentinel enqueue/consume paths, code2wav terminal flushing, deep metadata merge semantics, and quiet per-chunk transport logging. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Flip qwen3_omni async-chunk stages onto the coordinator+mixin transport, preserve audio finish behavior, and keep allowlist-active branches safe for partial unit-test mocks. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Carry segment-finished state across the legacy thinker to coordinator talker/code2wav boundary with a distinct connector signal so realtime async-chunk requests flush and finish correctly. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Keep prefill and decode rows separate across the streaming handoff, then consume cached decode rows as the talker decode prefix before current decode rows. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
e7bb493 to
4d46b5a
Compare
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Posted one inline blocker found during review.
| If the request finished on a chunk boundary (no unsent tail), emit a | ||
| finish-only flag instead of re-sending the last full chunk. | ||
| """ | ||
| chunk_size_config, left_context_size_config = _code2wav_codec_config(transfer_manager) |
There was a problem hiding this comment.
This finish-sentinel path does not mirror the live send path when initial_codec_chunk_frames is configured. The bundled Qwen3 Omni configs set initial_codec_chunk_frames: 4, so a request that ends exactly at 4 frames, or at 4 + N * codec_chunk_frames, has already emitted the full boundary chunk in the normal path. Here length % codec_chunk_frames is non-zero, so the sentinel re-sends those frames as a tail, duplicating audio at the end. Can we include the same initial-chunk adjustment used below before deciding whether there is an unsent tail?
|
is there any benefit of this PR in terms of perf? |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
For the Refactor of commnication layer, there are going to be 5~6 PRs in total.
4a: Qwen3 Omni only
4b: Other relevant models.
Refer to PR #1555 as the first PR.
PR:#2677 as the second PR.
PR:#3719 as the third PR.
Test Plan
test-ready, test-merge, test-nightly
Test Result
To be added.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)