Skip to content

squash merge pr/scheduler(#1625)#3

Closed
yJader wants to merge 29 commits into
dev-expfrom
tmp/dev-exp
Closed

squash merge pr/scheduler(#1625)#3
yJader wants to merge 29 commits into
dev-expfrom
tmp/dev-exp

Conversation

@yJader

@yJader yJader commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator

Rebase is too hard QAQ

asukaqaq-s and others added 28 commits March 3, 2026 17:33
Signed-off-by: asukaqaq-s <1311722138@qq.com>
…nd CFG

  mixin

Signed-off-by: asuka <1311722138@qq.com>
…st State Flow

Refactor diffusion runtime boundaries to separate scheduler state management from multiprocess IPC execution.

Core goals:
- Make Scheduler a pure request-state scheduler (waiting/running/finished) without owning IPC queues.
- Make MultiprocDiffusionExecutor a pure IPC runtime (broadcast/result queues + worker lifecycle).
- Let DiffusionEngine explicitly drive add_request -> schedule -> execute -> update_from_output.
- Consolidate cross-API concurrency control into DiffusionEngine._rpc_lock.

Main code changes:
- scheduler.py: introduce request status/state output types and pure scheduling APIs; remove scheduler-side IPC ownership.
- diffusion_engine.py: engine owns scheduler and _rpc_lock; refactor add_req_and_wait_for_response to scheduler-driven flow.
- multiproc_executor.py: executor directly manages IPC queues and worker lifecycle; decouple from scheduler internals.
- tests: add diffusion scheduler tests; rename/refactor multiproc concurrency test to engine-focused variant.

Test plan:
- pytest -m diffusion tests/diffusion/test_diffusion_scheduler.py
- pytest -m diffusion tests/diffusion/test_multiproc_engine_concurrency.py

Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
…eduler and StepScheduler

Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
…ve step tracking

Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
refactor(tests): cover async diffusion abort in entrypoint

Track in-flight executor futures in AsyncOmniDiffusion so queued requests can be cancelled before execution and running requests can forwarded to engine.abort().
Also refactor diffusion tests to cover queued and running abort paths at the entrypoint layer.
…t abort handling

Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
…p, add dummy run request_id

Signed-off-by: jader <yjader@foxmail.com>
…nd related components

Signed-off-by: jader <yjader@foxmail.com>
'

Signed-off-by: asukaqaq <1311722138@qq.com>
(cherry picked from commit dcbda70)
…ify request finalization

Signed-off-by: jader <yjader@foxmail.com>
DiffusionEngine seperate infer from add_req and get result, for
following stepwise batching.

Signed-off-by: Semmer2 <semmer@live.cn>
@yJader yJader closed this Mar 21, 2026
@yJader yJader reopened this Mar 21, 2026
- Add notes to scheduler
- Align with vllm-project#1908; move "step_execution" into `AsyncOmniEngine._create_default_diffusion_stage_cfg`
- NOTE: Due to 6bdb55a, tests are currently failing and need to be fixed later

Signed-off-by: jader <yjader@foxmail.com>
@yJader

yJader commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator Author

But this operation will break history, it needs more discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants