[Refactor] Refactor Diffusion Scheduler/Executor Boundaries and Request State Flow#1625
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1183c10d7b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
hsliuustc0106
left a comment
There was a problem hiding this comment.
is this the first PR of #874 ?
| with self._rpc_lock: | ||
| target_req_id = self.scheduler.add_request(request) | ||
|
|
||
| while True: |
There was a problem hiding this comment.
Holding _rpc_lock for the entire generation blocks all other RPCs. Is that intentional?
There was a problem hiding this comment.
Yes.
As discussed in #1448, production can have concurrent add_req and collective_rpc calls, and correctness issues can happen if request scheduling and IPC are not protected consistently.
Before this refactor, the call path was engine.add_req_and_wait_for_response -> executor.add_req -> scheduler.add_req, and scheduler.add_req held the lock for the whole enqueue/dequeue round-trip. So generation already serialized with other RPCs through the same lock domain.
After splitting scheduler/executor responsibilities, if we only lock inside the executor but leave scheduler.add_request/schedule/update_from_output outside that critical section, we can reintroduce request/response interleaving risks (the same class of issue as #1448).
So I intentionally moved the lock to engine level and hold it for the full add_req_and_wait_for_response cycle, to keep scheduler state transition + IPC execution atomic and preserve existing API semantics.
You are right that this blocks other RPCs during generation; this is a correctness-first tradeoff. A finer-grained solution would require changing add_req_and_wait_for_response semantics and a larger engine redesign, which may be out of scope for this PR.
There was a problem hiding this comment.
Makes sense, thanks for the context. Preserving the existing lock scope is reasonable here.
lishunyang12
left a comment
There was a problem hiding this comment.
scheduler as pure state machine is a nice clean-up. left one nit inline
| "please check the stack trace above for the root cause" | ||
| ) | ||
| if not isinstance(response, DiffusionOutput): | ||
| raise RuntimeError(f"Unexpected response type for generate: {type(response)!r}") |
There was a problem hiding this comment.
deadline is always None here, so the zmq.error.Again / TimeoutError catches on dequeue are dead code. Not blocking — just something to clean up or wire through a timeout param later.
There was a problem hiding this comment.
True. Since this is not needed for now, it’s better to remove this dead code rather than keep it for potential future use.
| self._running.clear() | ||
| self._finished_req_ids.clear() | ||
|
|
||
| def _make_req_id(self, request: OmniDiffusionRequest) -> str: |
There was a problem hiding this comment.
I remember vllm-omni will set a request id in api server level. Waht's the difference between them?
There was a problem hiding this comment.
OmniDiffusionRequest.request_ids is aligned with prompts (request_ids[i] -> prompts[i])
In contrast, DiffusionRequestState.req_id identifies the entire OmniDiffusionRequest, which is treated as a single scheduling unit.
There was a problem hiding this comment.
I found that req_id here is easy to confuse with OmniDiffusionRequest.request_id, while it actually refers to the scheduler-owned request identifier.
Renaming it to sched_req_id makes the ownership explicit and reduces confusion in follow-up development.
Even if this field may disappear in a future refactor when batching is fully moved into the scheduler, the rename still improves clarity today.
(edit in 03d7da4)
| ) | ||
|
|
||
|
|
||
| def test_single_request_success_lifecycle() -> None: |
There was a problem hiding this comment.
please add level and platform mark, you can see https://github.com/vllm-project/vllm-omni/blob/dev/vllm-align/docs/contributing/ci/tests_markers.md
There was a problem hiding this comment.
Thanks, I added the core_model and cpu markers.
Based on vllm-project#1625. - Refactor scheduler/executor boundaries and request state flow - Implement RequestScheduler and StepScheduler architecture - Add _max_batch_size enforcement - Handle output error in DiffusionEngine's dummy run Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: asukaqaq <1311722138@qq.com>
Based on vllm-project#1625. - Refactor scheduler/executor boundaries and request state flow - Implement RequestScheduler and StepScheduler architecture - Add _max_batch_size enforcement - Handle output error in DiffusionEngine's dummy run Signed-off-by: jader <yjader@foxmail.com>
|
@yJader Please take a look at the CI error |
While working on the follow-up step scheduler implementation, I found that the current scheduler design does not support future feature development very well. The scheduling interface, shared state management, and request-mode policy were too tightly coupled, which made it difficult to extend the scheduler cleanly and reuse common logic. Because of that, I split the scheduler into interface / base_scheduler / request_scheduler:
I also updated the PR README with the corresponding design notes, test status, and benchmark results. The current test and benchmark results look normal, and I did not observe regressions from this refactor. |
Fixed |
I checked this ci failure locally and could not reproduce it. Local run of
In CI, the failure log shows:
So the CI run still shows lower peak memory with CPU offload enabled, but it missed the current threshold by about 102 MB. My guess is that this may be caused by environment-dependent GPU memory noise / measurement variance in CI. |
|
I’ve rebased this branch onto the latest upstream main. The recent major changes (#1908) do not affect this PR. |
lishunyang12
left a comment
There was a problem hiding this comment.
Re-reviewed after the latest refactor. Previous nit (dead timeout code) is addressed. Two new items from the interface split.
| raise RuntimeError("Diffusion scheduler has no runnable requests.") | ||
| continue | ||
|
|
||
| sched_req_id = sched_output.scheduled_req_ids[0] |
There was a problem hiding this comment.
This assumes the scheduled request is always new (scheduled_new_reqs[0]). Currently safe because _max_batch_size=1 + _rpc_lock guarantees the first schedule for the target is a new request. But if batch size ever changes, this will IndexError on cached-only schedules. Worth a guard or at least a comment explaining the invariant.
There was a problem hiding this comment.
Thanks for the suggestion. Making max_batch_size > 1 would require more changes across the current execution path to support real scheduler-side batching, so a clarifying comment is the right fix for now. I added that in afe0d34.
- add notes in scheduler - align to vllm-project#1908, move "step_execution" into AsyncOmniEngine._create_default_diffusion_stage_cfg - note: due to 6bdb55a, tests can't pass Signed-off-by: jader <yjader@foxmail.com>
- Add notes to scheduler - Align with vllm-project#1908; move "step_execution" into `AsyncOmniEngine._create_default_diffusion_stage_cfg` - NOTE: Due to 6bdb55a, tests are currently failing and need to be fixed later Signed-off-by: jader <yjader@foxmail.com>
…st State Flow Refactor diffusion runtime boundaries to separate scheduler state management from multiprocess IPC execution. Core goals: - Make Scheduler a pure request-state scheduler (waiting/running/finished) without owning IPC queues. - Make MultiprocDiffusionExecutor a pure IPC runtime (broadcast/result queues + worker lifecycle). - Let DiffusionEngine explicitly drive add_request -> schedule -> execute -> update_from_output. - Consolidate cross-API concurrency control into DiffusionEngine._rpc_lock. Main code changes: - scheduler.py: introduce request status/state output types and pure scheduling APIs; remove scheduler-side IPC ownership. - diffusion_engine.py: engine owns scheduler and _rpc_lock; refactor add_req_and_wait_for_response to scheduler-driven flow. - multiproc_executor.py: executor directly manages IPC queues and worker lifecycle; decouple from scheduler internals. - tests: add diffusion scheduler tests; rename/refactor multiproc concurrency test to engine-focused variant. Test plan: - pytest -m diffusion tests/diffusion/test_diffusion_scheduler.py - pytest -m diffusion tests/diffusion/test_multiproc_engine_concurrency.py Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
…ests Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: JiangJie Zhang <76905040+yJader@users.noreply.github.com>
…dling Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: JiangJie Zhang <76905040+yJader@users.noreply.github.com>
Signed-off-by: jader <yjader@foxmail.com>
0d6e2f7 to
b4ac994
Compare
|
please fix CI, If your PR is ready, please @wtomin or @SamitHuang |
Co-authored-by: asukaqaq-s <1311722138@qq.com> Signed-off-by: jader <yjader@foxmail.com>
…e cleanup Signed-off-by: jader <yjader@foxmail.com>
Fixed. Please take a look when you have time. @wtomin @SamitHuang |
wtomin
left a comment
There was a problem hiding this comment.
All of my comments were addressed. LGTM.
…st State Flow (vllm-project#1625) Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: JiangJie Zhang <76905040+yJader@users.noreply.github.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Purpose
This PR refactors diffusion runtime boundaries by fully separating scheduler state management from multiprocess IPC execution.
Core goals:
Schedulera pure request-state scheduler (waiting/running/finished) without owning IPC queues.MultiprocDiffusionExecutora pure IPC runtime (broadcast/result queues + worker lifecycle).DiffusionEngineexplicitly driveadd_request -> schedule -> execute -> update_from_output.DiffusionEngine._rpc_lock, covering bothadd_req_and_wait_for_responseandcollective_rpc.Architecture before/after
Before Refactor
After Refactor
Key differences:
Schedulerchanges from a mixed “state + IPC” component to a pure scheduling state machine.Scheduleris also split internally into asched/package:SchedulerInterfacedefines the scheduler/engine boundary._BaseSchedulerowns shared request state, waiting/running/finished queues, and request-id mapping.RequestScheduleronly keeps the current request-mode scheduling policy.MultiprocDiffusionExecutorno longer depends on scheduler internals (_lock/mq/result_mq).DiffusionEngine._rpc_lock, covering generation andcollective_rpc.Main code changes:
vllm_omni/diffusion/sched/interface.pyDiffusionRequestStatus,DiffusionRequestState,NewRequestData,CachedRequestData,DiffusionSchedulerOutput, andSchedulerInterface.DiffusionSchedulerOutputnow explicitly separatesscheduled_new_reqs,scheduled_cached_reqs, andfinished_req_idsas the stable engine/scheduler scheduling output.vllm_omni/diffusion/sched/base_scheduler.py_request_states,_waiting,_running,_finished_req_ids.request_id -> sched_req_idmapping and common finish/cleanup logic.vllm_omni/diffusion/sched/request_scheduler.pynum_scheduled_reqsis currently0/1).vllm_omni/diffusion/sched/__init__.pyScheduler = RequestScheduleras an alias.vllm_omni/diffusion/diffusion_engine.py_rpc_lock).add_req_and_wait_for_responseinto a scheduler-driven sync loop that advances execution and state cleanup viaDiffusionSchedulerOutput.collective_rpc(timeout=...)honor end-to-end timeout across lock wait + RPC execution.vllm_omni/diffusion/executor/multiproc_executor.pyadd_reqdirectly performs generate RPC with response type/error/timeout checks.collective_rpcno longer relies on scheduler internals or locks.tests/diffusion/test_diffusion_scheduler.py.tests/diffusion/test_multiproc_executor_concurrency.py->tests/diffusion/test_multiproc_engine_concurrency.py.Test Plan
Unit/functional tests:
Serving benchmark command:
Test Result
Benchmark setup:
vllm-omniQwen/Qwen-Imagevbencht2i201Benchmark comparison:
f52b5153)72df5305)Result summary:
CC List
@hsliuustc0106 @ZJY0516 @lishunyang12