[Feat] Support step-boundary abort in diffusion#1769
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfcffff64f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # check abort queue | ||
| self._process_aborts_queue() | ||
|
|
||
| finished_req_ids = self.scheduler.update_from_output(sched_output, output) |
There was a problem hiding this comment.
Pass request-mode result in scheduler's expected type
In the default non-step path (step_execution=False), self.execute_fn is executor.execute_request, which returns a RunnerOutput, but this value is passed straight into RequestScheduler.update_from_output. RequestScheduler.update_from_output expects a DiffusionOutput and dereferences output.error, so the first normal request can fail with AttributeError before any result is returned.
Useful? React with 👍 / 👎.
|
|
||
| def _abort_requests(self, request_id): | ||
| # TODO:support finish_request function | ||
| self.scheduler.abort_request(request_id, DiffusionRequestStatus.FINISHED_ABORTED) |
There was a problem hiding this comment.
Call abort_request with the scheduler's actual signature
_process_aborts_queue builds a list of request IDs, then _abort_requests forwards that list to scheduler.abort_request with an extra status argument. Current scheduler implementations accept only a single req_id parameter, so any abort processing can raise TypeError instead of marking requests aborted.
Useful? React with 👍 / 👎.
3c33ee0 to
c6c0ead
Compare
|
@wtomin @SamitHuang @david6666666 PTAL, thx |
c6c0ead to
34e4623
Compare
|
All conflicts resolved. Summary:
Note: The inline diffusion path introduced in #1715 bypasses AsyncOmniDiffusion, TO Fix: call self._inline_engine.engine.abort(request_id) in _generate_inline's |
89d00f5 to
81ba58a
Compare
|
• Fixed inline diffusion request cancellation in AsyncOmni. After merging origin/main, single-stage diffusion started using the new inline path, which called synchronous OmniDiffusion.generate() through run_in_executor(). That kept the stage-worker RPC optimization, but broke abort behavior: cancelling the asyncio task only cancelled the outer future, not the underlying running thread, and queued inline requests were not cleanly abortable. The fix keeps inline execution fully in-process, but switches the inline engine to AsyncOmniDiffusion,which already has proper request-state tracking, queued-future cancellation, and forwarding of running-request aborts into DiffusionEngine. I also updated inline cancellation handling to explicitly abort on CancelledError, and changed inline AsyncOmni.abort() to call the async wrapper’s abort() instead of bypassing it and talking directly to the raw engine. Result: inline diffusion now preserves the RPC-saving fast path while restoring both step-level interruption for running requests and clean cancellation for waiting requests. |
|
Bug fixed: CFG parallel previously deadlocked in step-execution mode because non-rank-0 workers received None from predict_noise_maybe_with_cfg() and skipped the collective broadcast in scheduler_step_maybe_with_cfg(). Fixed by always calling step_scheduler() regardless of noise_pred value. |
8b97676 to
3b73a27
Compare
3b73a27 to
f71db1a
Compare
|
@wtomin @ZJY0516 @SamitHuang PTAL |
35b0462 to
25bc9e1
Compare
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>
|
Merge conflicts with main have been resolved |
Signed-off-by: jader <yjader@foxmail.com>
…reply_rank is None Signed-off-by: jader <yjader@foxmail.com>
| action="store_true", | ||
| help="Enable cache-dit summary logging after diffusion forward passes.", | ||
| ) | ||
| omni_config_group.add_argument( |
There was a problem hiding this comment.
do we have a validation for this arg if it's accidentally triggered in non-diffusion models? or do we expect this will help in ar+dit model as well?
There was a problem hiding this comment.
- If this argument is triggered in non-diffusion engine scenarios, it should not have any effect.
- For AR+DiT models, I have also discussed this in other community issues. If they adopt the diffusion engine, step-execution will be effective, and I expect it will help as well.
…Scheduler abort Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
|
Thanks for your great contributions. I have a few comments:
|
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: asukaqaq-s <1311722138@qq.com> Co-authored-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: asukaqaq-s <1311722138@qq.com> Co-authored-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: asukaqaq-s <1311722138@qq.com> Co-authored-by: jader <yjader@foxmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
based on:RFC 874 and
PR
#1368
#1625
merging their work and adding further modifications to support stepwise execution/scheduling and request abort.
Summary
Support aborting both running and waiting diffusion requests, with immediate cancellation and zero post-abort performance overhead.
Pipeline / Runner
Worker / Executor
Introduce DiffusionRequestAbortedError for structured abort signaling across layers
Scheduler / Engine / Entrypoints
Benchmark (Qwen-Image, 50 steps, single GPU)
Conclusions:
future.cancel()), 512 already shows zero overheadabort_queuebetween each denoise step), 1024 overhead drops from +71% to +3%verify
--ring 2--usp 2--cfg-parallel-size 2Key findings:
All three parallel strategies (Ring, Ulysses SP, CFG) work correctly with --step-execution.
Post-abort latency is consistent with baseline — no performance degradation after repeated aborts.
CFG parallel requires negative_prompt in the request to activate; without it the second GPU idles.
E2E test
server
client
Test plan
test_diffusion_stepmodel.py): scheduler, engine, async abortfuture.cancel()for queued requests (never reach engine)engine.abort()forwarding for running requestsTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)