[core]refactor communication layer: PR1(Added Refactor Infra Only) by natureofnature · Pull Request #1555 · vllm-project/vllm-omni

natureofnature · 2026-02-27T17:27:12Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Refactor the vLLM-Omni communication layer by moving all data-plane connector.put()/connector.get() calls from Orchestrator, OmniStage, and Scheduler into a unified OmniConnectorModelRunnerMixin at the Model Runner level, while keeping scheduling coordination logic (e.g., WAITING_FOR_CHUNK state) in the Scheduler.

Refer to RFC [RFC]: Refactor Communication Layer #1546

This is the first PR for communication refactoring, which does not change existing workflow. The core files include Omni connector mixin and scheduler coordinator.

Architecture

Status Summary

Test Plan

pytest -s -v tests/e2e/ --ignore-glob='*expansion.py' -m "advanced_model" --run-level=advanced_model

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cd54ca33f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

lishunyang12

Left a few comments. This is a big refactor -- worth getting the threading right before it goes in.

R2-Y · 2026-03-03T01:49:56Z

-        if self.chunk_transfer_adapter:
-            self.chunk_transfer_adapter.process_pending_chunks(self.waiting, self.running)
+        if self.chunk_coordinator:
+            oco = self._latest_omni_connector_output


the variable names are too vague

R2-Y · 2026-03-03T02:14:14Z

+    #  Core scheduling methods
+    # ------------------------------------------------------------------ #
+
+    def process_pending_chunks(


Is it possible to put the processing of Async Chunk and non-async chunk code blocks in the same function? Currently, some processing seems replicated in process_pending_chunks & process_pending_batch_inputs. Could we use a single queue, waiting_for_input_waiting, to handle both waiting_for_input and waiting_for_chunk_waiting simultaneously, and only hold one request status (e.g., WAITING_FOR_INPUT) for both asynchronous and non-asynchronous chunk modes?

R2-Y · 2026-03-03T02:15:36Z

    finished_requests_needing_kv_transfer: dict[str, dict] = field(default_factory=dict)
+    # Requests that need to be registered for chunk recv by the Model Runner's
+    # background thread. Populated by ChunkSchedulingCoordinator.
+    pending_chunk_registrations: list = field(default_factory=list)


same as above. can we keep only one for registration?

R2-Y · 2026-03-03T02:17:35Z

+                seed_input["prompt_token_ids"] = [0] * next_prompt_len
+                seed_input["multi_modal_data"] = seed_input["mm_processor_kwargs"] = None
+                for ds_stage_id in range(1, len(self.stage_list)):
+                    sp_ds = sampling_params_list[ds_stage_id]


the variable name is too vague

lishunyang12 · 2026-03-04T14:30:06Z

Hey @natureofnature — I see two new commits since my review, but my 4 inline comments from 2/28 still seem open (busy-spin in recv_loop, shallow copy, dead _finished_save_reqs, duplicate MockQueue in e2e test). Could you take a look when you get a chance?

natureofnature · 2026-03-04T15:50:53Z

Hey @natureofnature — I see two new commits since my review, but my 4 inline comments from 2/28 still seem open (busy-spin in recv_loop, shallow copy, dead _finished_save_reqs, duplicate MockQueue in e2e test). Could you take a look when you get a chance?

I'm still working on it, and will fix them once the basic workflow functions well. @lishunyang12

natureofnature · 2026-03-06T09:28:12Z

@codex review

natureofnature · 2026-03-09T10:01:58Z

@tzhouam @divyanshsinghvi PTAL

hsliuustc0106

Gate Status: ❌ BLOCKED

Check	Status
DCO	✅
Docs	✅
Mergeable	❌ CONFLICTING

Prior Feedback Unaddressed (2/28)

Dead state: _finished_save_reqs initialized but never read
Silent connector failure: Returns None instead of failing fast
Busy-spin: time.sleep(0.001) placement still suboptimal
Duplicate MockQueue: Same class in two test files

Test Evidence Missing

Claims "Qwen2.5, Qwen 3 omni, Bagel works" but provides no test commands or output. Please add reproduction steps.

Minor: MRO Inconsistency

# gpu_ar_model_runner.py:154
class GPUARModelRunner(OmniGPUModelRunner, OmniConnectorModelRunnerMixin):  # ✓

# gpu_generation_model_runner.py:45
class GPUGenerationModelRunner(OmniConnectorModelRunnerMixin, OmniGPUModelRunner):  # ✗

Please resolve conflicts and address the 4 prior items.

hsliuustc0106

this PR to too large, can we test this in one model intensively before moving to all models?

natureofnature · 2026-03-30T02:45:32Z

this PR to too large, can we test this in one model intensively before moving to all models?
I split the pr to two parts, this is the first part, which does not change existing workflow. @hsliuustc0106

hsliuustc0106 · 2026-03-30T07:35:41Z

@@ -0,0 +1,380 @@
+# SPDX-License-Identifier: Apache-2.0


@amy-why-3459 PTAL

fixed

hsliuustc0106 · 2026-04-10T12:43:11Z

merge ci passed but nightly ci failed

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature · 2026-04-12T13:57:42Z

merge ci passed but nightly ci failed

I fixed one issue that caused by this PR. And it seems current nightly CI problems (https://buildkite.com/vllm/vllm-omni/builds/6413/steps/canvas) are not related to this PR. @hsliuustc0106

…llm-project#1555) Signed-off-by: natureofnature <wzliu@connect.hku.hk> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

natureofnature requested a review from hsliuustc0106 as a code owner February 27, 2026 17:27

chatgpt-codex-connector Bot reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni.py Outdated

Comment thread vllm_omni/entrypoints/async_omni.py Outdated

Comment thread vllm_omni/worker/omni_connector_model_runner_mixin.py Outdated

natureofnature mentioned this pull request Feb 27, 2026

[RFC]: Refactor Communication Layer #1546

Open

1 task

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

Comment thread vllm_omni/worker/omni_connector_model_runner_mixin.py

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

Comment thread vllm_omni/worker/omni_connector_model_runner_mixin.py

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

Comment thread vllm_omni/worker/omni_connector_model_runner_mixin.py

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

Comment thread tests/e2e/online_serving/test_refactored_communication.py Outdated

natureofnature mentioned this pull request Mar 2, 2026

[RFC]: Refactor Communication Layer under non async mode JiusiServe/vllm-omni#143

Open

1 task

R2-Y reviewed Mar 3, 2026

View reviewed changes

natureofnature force-pushed the refactor/communication_layer branch from 04a89ed to 3d3e6e8 Compare March 9, 2026 02:20

natureofnature force-pushed the refactor/communication_layer branch 2 times, most recently from 391544a to bda4aac Compare March 10, 2026 14:23

natureofnature changed the title ~~[WIP][core]refactor commnication layer~~ [WIP][core]refactor communication layer Mar 11, 2026

Gaohan123 added this to the v0.18.0 milestone Mar 11, 2026

natureofnature force-pushed the refactor/communication_layer branch 4 times, most recently from 4b8fd82 to e9601a4 Compare March 18, 2026 06:43

hsliuustc0106 previously requested changes Mar 18, 2026

View reviewed changes

natureofnature force-pushed the refactor/communication_layer branch 4 times, most recently from 2b2b419 to 776d5ad Compare March 19, 2026 02:59

natureofnature marked this pull request as draft March 19, 2026 06:57

natureofnature force-pushed the refactor/communication_layer branch from 776d5ad to 5b24a97 Compare March 20, 2026 08:11

hsliuustc0106 reviewed Mar 24, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/talker_prompt_utils.py Outdated

natureofnature force-pushed the refactor/communication_layer branch 3 times, most recently from e6d6219 to eb7de9b Compare March 30, 2026 02:26

natureofnature changed the title ~~[WIP][core]refactor communication layer~~ [core]refactor communication layer: PR1(Added Refactor Infra Only) Mar 30, 2026

natureofnature requested a review from hsliuustc0106 March 30, 2026 06:27

hsliuustc0106 reviewed Mar 30, 2026

View reviewed changes

hsliuustc0106 self-requested a review March 30, 2026 07:45

hsliuustc0106 added the nightly-test label to trigger buildkite nightly test CI label Mar 30, 2026

natureofnature mentioned this pull request Apr 10, 2026

[2/5] [core]refactor communication layer: PR 2 of 5 Qwen3 Omni non async #2677

Merged

10 tasks

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 10, 2026

natureofnature added 4 commits April 12, 2026 07:46

Add omni connector runner infrastructure

fd29aa5

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Add PR1 unit coverage for mixin and coordinator

dc940b4

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Mark PR1 mixin unit tests as core_model cpu

cd46985

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Move payload span helpers under worker

590a7d0

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature force-pushed the refactor/communication_layer branch 2 times, most recently from eddf5d1 to 590a7d0 Compare April 12, 2026 07:52

fix load custom func

bff7fa7

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

hsliuustc0106 added the high priority high priority issue, needs to be done asap label Apr 12, 2026

hsliuustc0106 and others added 2 commits April 12, 2026 22:19

Merge branch 'main' into refactor/communication_layer

78aac44

Merge branch 'main' into refactor/communication_layer

0a349a9

hsliuustc0106 merged commit 0d4e975 into vllm-project:main Apr 13, 2026
7 of 8 checks passed

natureofnature mentioned this pull request May 19, 2026

[3/5] [WIP][core]refactor communication layer: PR 3 of 5, all other models in non async mode #3719

Open

9 tasks

Conversation

natureofnature commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Architecture

Status Summary

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

R2-Y Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

R2-Y Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

R2-Y Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

R2-Y Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natureofnature commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natureofnature commented Mar 6, 2026

Uh oh!

natureofnature commented Mar 9, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Gate Status: ❌ BLOCKED

Prior Feedback Unaddressed (2/28)

Test Evidence Missing

Minor: MRO Inconsistency

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

natureofnature commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 10, 2026

Uh oh!

natureofnature commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

natureofnature commented Feb 27, 2026 •

edited

Loading

lishunyang12 commented Mar 4, 2026 •

edited

Loading

natureofnature commented Mar 4, 2026 •

edited

Loading

natureofnature commented Apr 12, 2026 •

edited

Loading