[Feat] Support step-boundary abort in diffusion by asukaqaq-s · Pull Request #1769 · vllm-project/vllm-omni

asukaqaq-s · 2026-03-10T03:53:28Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

based on：RFC 874 and

merging their work and adding further modifications to support stepwise execution/scheduling and request abort.

Summary

Support aborting both running and waiting diffusion requests, with immediate cancellation and zero post-abort performance overhead.

Pipeline / Runner

Add step-level execution support (--step-execution) so the abort queue is checked between each denoise step, not only after the entire request completes

Worker / Executor

Add execute_step() / execute_stepwise() path that forwards per-step RunnerOutput (with step_index and finished fields) back to the engine
execute_request() wraps full-request DiffusionOutput into RunnerOutput for unified handling at the engine layer
Introduce DiffusionRequestAbortedError for structured abort signaling across layers

Scheduler / Engine / Entrypoints

StepScheduler checks abort status between steps; engine-level guard short-circuits update_from_output for already-aborted requests to prevent infinite re-queue loop
AsyncOmniDiffusion uses Future-based request tracking: queued requests are cancelled instantly via future.cancel(), running requests are forwarded to engine.abort()
_request_id_to_sched_req_id mapping supports aborting batched requests by any constituent request ID

Benchmark (Qwen-Image, 50 steps, single GPU)

Configuration	Resolution	Baseline	Post-abort	Total	Overhead
Original vllm-omni	512x512	10.2s	27.7s	42.9s	+17.5s (+172%)
Original vllm-omni	1024x1024	21.2s	116.0s	142.3s	+94.8s (+447%)
Abort waiting (entrypoints rewrite)	512x512	6.8s	5.8s	17.6s	-1.0s (~0%)
Abort waiting (entrypoints rewrite)	1024x1024	20.3s	34.8s	60.2s	+14.5s (+71%)
Abort waiting + running (step-exec)	512x512	6.5s	5.6s	17.1s	-0.9s (~0%)
Abort waiting + running (step-exec)	1024x1024	20.2s	20.9s	46.1s	+0.7s (+3%)

Conclusions:

Original 1024 post-abort takes 116s — severe blocking, abort is completely ineffective, the previous request must finish before the next one can start
Entrypoints rewrite solves abort for queued requests (future.cancel()), 512 already shows zero overhead
Step-execution solves abort for running requests (checks abort_queue between each denoise step), 1024 overhead drops from +71% to +3%
No baseline performance regression (512 being faster is due to other optimizations on the dev-exp branch)

verify

Parallel Mode	Baseline (s)	Post-abort (s)	Abort Recovery	Status
`--ring 2`	14.14	14.36	5/5	Pass
`--usp 2`	12.87	13.10	5/5	Pass
`--cfg-parallel-size 2`	21.08	21.22	5/5	Pass

Key findings:
All three parallel strategies (Ring, Ulysses SP, CFG) work correctly with --step-execution.
Post-abort latency is consistent with baseline — no performance degradation after repeated aborts.
CFG parallel requires negative_prompt in the request to activate; without it the second GPU idles.

E2E test

server

baseline：
vllm serve Qwen/Qwen-Image --omni --port 8091
outrs：
vllm serve Qwen/Qwen-Image --omni --port 8091 --step-execution

client

curl -s http://localhost:8091/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "messages": [
      {"role": "user", "content": "A flower"}
    ],
    "extra_body": {
      "height": 512,
      "width": 512,
      "num_inference_steps": 50,
      "true_cfg_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png

Test plan

18 unit tests pass (test_diffusion_stepmodel.py): scheduler, engine, async abort
End-to-end curl abort test: 5x SIGINT interrupts followed by successful generation
Verified future.cancel() for queued requests (never reach engine)
Verified engine.abort() forwarding for running requests
Verified post-abort recovery with zero performance overhead

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfcffff64f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-10T03:58:28Z

+                # check abort queue
+                self._process_aborts_queue()
+
+                finished_req_ids = self.scheduler.update_from_output(sched_output, output)


Pass request-mode result in scheduler's expected type

In the default non-step path (step_execution=False), self.execute_fn is executor.execute_request, which returns a RunnerOutput, but this value is passed straight into RequestScheduler.update_from_output. RequestScheduler.update_from_output expects a DiffusionOutput and dereferences output.error, so the first normal request can fail with AttributeError before any result is returned.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-10T03:58:28Z

+
+    def _abort_requests(self, request_id):
+        # TODO:support finish_request function
+        self.scheduler.abort_request(request_id, DiffusionRequestStatus.FINISHED_ABORTED)


Call abort_request with the scheduler's actual signature

_process_aborts_queue builds a list of request IDs, then _abort_requests forwards that list to scheduler.abort_request with an extra status argument. Current scheduler implementations accept only a single req_id parameter, so any abort processing can raise TypeError instead of marking requests aborted.

Useful? React with 👍 / 👎.

david6666666 · 2026-03-11T01:53:19Z

@wtomin @SamitHuang @david6666666 PTAL, thx

asukaqaq-s · 2026-03-11T15:03:26Z

All conflicts resolved. Summary:

data.py: Kept both — step_execution field (ours) + is_moe property (main)
diffusion_engine.py: Kept both — exec_total_time timing (main) + abort check (ours)
scheduler.py: Deleted (refactored into sched/); ported unpack_diffusion_output_shm from main into multiproc_executor.py at both dequeue paths

Note: The inline diffusion path introduced in #1715 bypasses AsyncOmniDiffusion,
so engine.abort() is never called on request cancellation. This breaks step-level
abort for --step-execution mode.

TO Fix: call self._inline_engine.engine.abort(request_id) in _generate_inline's
CancelledError handler. Will push a fix in the next commit.

asukaqaq-s · 2026-03-11T17:24:32Z

• Fixed inline diffusion request cancellation in AsyncOmni.

After merging origin/main, single-stage diffusion started using the new inline path, which called synchronous OmniDiffusion.generate() through run_in_executor(). That kept the stage-worker RPC optimization, but broke abort behavior: cancelling the asyncio task only cancelled the outer future, not the underlying running thread, and queued inline requests were not cleanly abortable.

The fix keeps inline execution fully in-process, but switches the inline engine to AsyncOmniDiffusion,which already has proper request-state tracking, queued-future cancellation, and forwarding of running-request aborts into DiffusionEngine. I also updated inline cancellation handling to explicitly abort on CancelledError, and changed inline AsyncOmni.abort() to call the async wrapper’s abort() instead of bypassing it and talking directly to the raw engine.

Result: inline diffusion now preserves the RPC-saving fast path while restoring both step-level interruption for running requests and clean cancellation for waiting requests.

asukaqaq-s · 2026-03-12T08:40:45Z

Bug fixed: CFG parallel previously deadlocked in step-execution mode because non-rank-0 workers received None from predict_noise_maybe_with_cfg() and skipped the collective broadcast in scheduler_step_maybe_with_cfg(). Fixed by always calling step_scheduler() regardless of noise_pred value.

hsliuustc0106

@ApsarasX PTAL

Gaohan123 · 2026-03-12T16:05:49Z

@wtomin @ZJY0516 @SamitHuang PTAL

Signed-off-by: asukaqaq-s <1311722138@qq.com>

…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>

yJader · 2026-03-24T08:06:29Z

Merge conflicts with main have been resolved

Signed-off-by: jader <yjader@foxmail.com>

…reply_rank is None Signed-off-by: jader <yjader@foxmail.com>

hsliuustc0106 · 2026-03-28T14:42:18Z

            action="store_true",
            help="Enable cache-dit summary logging after diffusion forward passes.",
        )
+        omni_config_group.add_argument(


do we have a validation for this arg if it's accidentally triggered in non-diffusion models? or do we expect this will help in ar+dit model as well?

If this argument is triggered in non-diffusion engine scenarios, it should not have any effect.

For AR+DiT models, I have also discussed this in other community issues. If they adopt the diffusion engine, step-execution will be effective, and I expect it will help as well.

…Scheduler abort Signed-off-by: jader <yjader@foxmail.com>

Signed-off-by: jader <yjader@foxmail.com>

wtomin · 2026-04-01T03:57:40Z

Thanks for your great contributions. I have a few comments:

Currently only qwen-image supports step-level execution, but it is an important feature that should be included in docs/user_guide/diffusion_features.md, and please give the feature compatibility information in this doc, too.
In tests/e2e/online_serving/test_qwen_image_expansion.py test script, please add --step-execution test case.

Signed-off-by: jader <yjader@foxmail.com>

wtomin

LGTM

Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: asukaqaq-s <1311722138@qq.com> Co-authored-by: jader <yjader@foxmail.com>

asukaqaq-s requested a review from hsliuustc0106 as a code owner March 10, 2026 03:53

asukaqaq-s changed the title ~~Pr/abort clean~~ [Draft] Support step-boundary abort in diffusion executor/worker Mar 10, 2026

chatgpt-codex-connector Bot reviewed Mar 10, 2026

View reviewed changes

asukaqaq-s changed the title ~~[Draft] Support step-boundary abort in diffusion executor/worker~~ [Draft][Feat] Support step-boundary abort in diffusion Mar 10, 2026

asukaqaq-s force-pushed the pr/abort-clean branch from 3c33ee0 to c6c0ead Compare March 10, 2026 16:55

david6666666 mentioned this pull request Mar 11, 2026

[RFC]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#167

Closed

26 tasks

asukaqaq-s force-pushed the pr/abort-clean branch from c6c0ead to 34e4623 Compare March 11, 2026 02:37

asukaqaq-s force-pushed the pr/abort-clean branch 2 times, most recently from 89d00f5 to 81ba58a Compare March 11, 2026 17:10

asukaqaq-s force-pushed the pr/abort-clean branch from 8b97676 to 3b73a27 Compare March 12, 2026 08:56

asukaqaq-s changed the title ~~[Draft][Feat] Support step-boundary abort in diffusion~~ [Feat] Support step-boundary abort in diffusion Mar 12, 2026

asukaqaq-s mentioned this pull request Mar 12, 2026

reafator pipeline stage/step pipeline #1368

Merged

5 tasks

asukaqaq-s force-pushed the pr/abort-clean branch from 3b73a27 to f71db1a Compare March 12, 2026 09:13

wtomin mentioned this pull request Mar 12, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

54 tasks

hsliuustc0106 added the high priority high priority issue, needs to be done asap label Mar 12, 2026

hsliuustc0106 reviewed Mar 12, 2026

View reviewed changes

Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026

david6666666 mentioned this pull request Mar 13, 2026

[Feature]: Abort request when http disconnects JiusiServe/LM-service#75

Closed

1 task

ZJY0516 requested review from SamitHuang and wtomin March 13, 2026 04:04

asukaqaq-s mentioned this pull request Mar 16, 2026

[Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring #1908

Merged

10 tasks

Gaohan123 removed this from the v0.18.0 milestone Mar 17, 2026

asukaqaq-s force-pushed the pr/abort-clean branch 2 times, most recently from 35b0462 to 25bc9e1 Compare March 22, 2026 14:52

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 22, 2026

asukaqaq-s and others added 5 commits March 24, 2026 07:53

bugfix(diffusion): fix pre-commit

88d9464

Signed-off-by: asukaqaq-s <1311722138@qq.com>

feat(entrypoints): support running diffusion abort

7727657

Signed-off-by: asukaqaq-s <1311722138@qq.com>

test(diffusion): fix bug and support more test

3a62560

Signed-off-by: asukaqaq-s <1311722138@qq.com>

test(diffusion): align multiproc concurrency setup

1e63678

Signed-off-by: asukaqaq-s <1311722138@qq.com>

refactor(diffusion/scheduler): update scheduler to use max_num_runnin…

66a086b

…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>

yJader force-pushed the pr/abort-clean branch from 9409282 to 66a086b Compare March 24, 2026 08:03

yJader added 2 commits March 24, 2026 08:34

style(diffusion): remove unused import of os in diffusion_engine.py

83f126e

Signed-off-by: jader <yjader@foxmail.com>

fix(diffusion/executor): allow exec_all_ranks to be true when unique_…

bfcca9b

…reply_rank is None Signed-off-by: jader <yjader@foxmail.com>

wtomin mentioned this pull request Mar 26, 2026

[RFC]: vLLM-Omni Diffusion Module — Q2 2026 Roadmap #2226

Open

25 tasks

asukaqaq-s mentioned this pull request Mar 28, 2026

[RFC]: Pipeline Parallelism & Stream Batch for Real-Time Video Generation #2280

Open

16 tasks

Merge branch 'main' into pr/abort-clean

17fffdf

hsliuustc0106 reviewed Mar 28, 2026

View reviewed changes

yJader added 2 commits March 30, 2026 03:37

test(diffusion): add StepScheduler coverage and remove unused Request…

c3cc7e2

…Scheduler abort Signed-off-by: jader <yjader@foxmail.com>

refactor(diffusion): centralize shared scheduler add/schedule flow

39e33d7

Signed-off-by: jader <yjader@foxmail.com>

david6666666 mentioned this pull request Mar 30, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Closed

1 task

hsliuustc0106 mentioned this pull request Mar 30, 2026

[Aligned with vllm 0.21.0 now][RFC]: vLLM-Omni 2026 Q2 Roadmap #2136

Open

1 task

Merge branch 'main' into pr/abort-clean

5695698

Signed-off-by: jader <yjader@foxmail.com>

yJader added 2 commits April 1, 2026 08:02

docs(diffusion): add step execution feature to diffusion features guide

dab0672

Signed-off-by: jader <yjader@foxmail.com>

test(diffusion): add step execution case for Qwen-Image models

29283b8

Signed-off-by: jader <yjader@foxmail.com>

wtomin approved these changes Apr 1, 2026

View reviewed changes

wtomin merged commit 3fd4a4d into vllm-project:main Apr 1, 2026
7 of 8 checks passed

fake0fan mentioned this pull request Apr 2, 2026

[Enhancement] Engine runtime errors #2426

Merged

5 tasks

wtomin mentioned this pull request Apr 13, 2026

[Perf] Add Performance Test for Qwen-Image Step-Level Execution #2707

Merged

5 tasks

asukaqaq-s mentioned this pull request Apr 16, 2026

[Diffusion] Add Ray executor backend for distributed execution #2106

Open

5 tasks

yJader mentioned this pull request May 12, 2026

[RFC] [Refactor]: Unify diffusion request identity around request_id #3550

Open

1 task

Conversation

asukaqaq-s commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Summary

Benchmark (Qwen-Image, 50 steps, single GPU)

verify

E2E test

Test plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

asukaqaq-s Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

asukaqaq-s Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Mar 11, 2026

Uh oh!

asukaqaq-s commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asukaqaq-s commented Mar 11, 2026

Uh oh!

asukaqaq-s commented Mar 12, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Mar 12, 2026

Uh oh!

yJader commented Mar 24, 2026

Uh oh!

hsliuustc0106 Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

asukaqaq-s Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

wtomin commented Apr 1, 2026

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

asukaqaq-s commented Mar 10, 2026 •

edited

Loading

asukaqaq-s commented Mar 11, 2026 •

edited

Loading