Skip to content

[Feat] Support step-boundary abort in diffusion#1769

Merged
wtomin merged 17 commits into
vllm-project:mainfrom
omni-nicelab:pr/abort-clean
Apr 1, 2026
Merged

[Feat] Support step-boundary abort in diffusion#1769
wtomin merged 17 commits into
vllm-project:mainfrom
omni-nicelab:pr/abort-clean

Conversation

@asukaqaq-s
Copy link
Copy Markdown
Contributor

@asukaqaq-s asukaqaq-s commented Mar 10, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

based on:RFC 874 and

PR
#1368
#1625

merging their work and adding further modifications to support stepwise execution/scheduling and request abort.

Summary

Support aborting both running and waiting diffusion requests, with immediate cancellation and zero post-abort performance overhead.

Pipeline / Runner

  • Add step-level execution support (--step-execution) so the abort queue is checked between each denoise step, not only after the entire request completes

Worker / Executor

  • Add execute_step() / execute_stepwise() path that forwards per-step RunnerOutput (with step_index and finished fields) back to the engine
  • execute_request() wraps full-request DiffusionOutput into RunnerOutput for unified handling at the engine layer
    Introduce DiffusionRequestAbortedError for structured abort signaling across layers

Scheduler / Engine / Entrypoints

  • StepScheduler checks abort status between steps; engine-level guard short-circuits update_from_output for already-aborted requests to prevent infinite re-queue loop
  • AsyncOmniDiffusion uses Future-based request tracking: queued requests are cancelled instantly via future.cancel(), running requests are forwarded to engine.abort()
  • _request_id_to_sched_req_id mapping supports aborting batched requests by any constituent request ID

Benchmark (Qwen-Image, 50 steps, single GPU)

Configuration Resolution Baseline Post-abort Total Overhead
Original vllm-omni 512x512 10.2s 27.7s 42.9s +17.5s (+172%)
Original vllm-omni 1024x1024 21.2s 116.0s 142.3s +94.8s (+447%)
Abort waiting (entrypoints rewrite) 512x512 6.8s 5.8s 17.6s -1.0s (~0%)
Abort waiting (entrypoints rewrite) 1024x1024 20.3s 34.8s 60.2s +14.5s (+71%)
Abort waiting + running (step-exec) 512x512 6.5s 5.6s 17.1s -0.9s (~0%)
Abort waiting + running (step-exec) 1024x1024 20.2s 20.9s 46.1s +0.7s (+3%)

Conclusions:

  • Original 1024 post-abort takes 116s — severe blocking, abort is completely ineffective, the previous request must finish before the next one can start
  • Entrypoints rewrite solves abort for queued requests (future.cancel()), 512 already shows zero overhead
  • Step-execution solves abort for running requests (checks abort_queue between each denoise step), 1024 overhead drops from +71% to +3%
  • No baseline performance regression (512 being faster is due to other optimizations on the dev-exp branch)

verify

Parallel Mode Baseline (s) Post-abort (s) Abort Recovery Status
--ring 2 14.14 14.36 5/5 Pass
--usp 2 12.87 13.10 5/5 Pass
--cfg-parallel-size 2 21.08 21.22 5/5 Pass

Key findings:
All three parallel strategies (Ring, Ulysses SP, CFG) work correctly with --step-execution.
Post-abort latency is consistent with baseline — no performance degradation after repeated aborts.
CFG parallel requires negative_prompt in the request to activate; without it the second GPU idles.

E2E test

server

baseline:
vllm serve Qwen/Qwen-Image --omni --port 8091
outrs:
vllm serve Qwen/Qwen-Image --omni --port 8091 --step-execution

client

curl -s http://localhost:8091/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "messages": [
      {"role": "user", "content": "A flower"}
    ],
    "extra_body": {
      "height": 512,
      "width": 512,
      "num_inference_steps": 50,
      "true_cfg_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png

Test plan

  • 18 unit tests pass (test_diffusion_stepmodel.py): scheduler, engine, async abort
  • End-to-end curl abort test: 5x SIGINT interrupts followed by successful generation
  • Verified future.cancel() for queued requests (never reach engine)
  • Verified engine.abort() forwarding for running requests
  • Verified post-abort recovery with zero performance overhead

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@asukaqaq-s asukaqaq-s changed the title Pr/abort clean [Draft] Support step-boundary abort in diffusion executor/worker Mar 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfcffff64f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated
# check abort queue
self._process_aborts_queue()

finished_req_ids = self.scheduler.update_from_output(sched_output, output)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Pass request-mode result in scheduler's expected type

In the default non-step path (step_execution=False), self.execute_fn is executor.execute_request, which returns a RunnerOutput, but this value is passed straight into RequestScheduler.update_from_output. RequestScheduler.update_from_output expects a DiffusionOutput and dereferences output.error, so the first normal request can fail with AttributeError before any result is returned.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread vllm_omni/diffusion/diffusion_engine.py Outdated

def _abort_requests(self, request_id):
# TODO:support finish_request function
self.scheduler.abort_request(request_id, DiffusionRequestStatus.FINISHED_ABORTED)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Call abort_request with the scheduler's actual signature

_process_aborts_queue builds a list of request IDs, then _abort_requests forwards that list to scheduler.abort_request with an extra status argument. Current scheduler implementations accept only a single req_id parameter, so any abort processing can raise TypeError instead of marking requests aborted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@asukaqaq-s asukaqaq-s changed the title [Draft] Support step-boundary abort in diffusion executor/worker [Draft][Feat] Support step-boundary abort in diffusion Mar 10, 2026
@david6666666
Copy link
Copy Markdown
Collaborator

@wtomin @SamitHuang @david6666666 PTAL, thx

@asukaqaq-s
Copy link
Copy Markdown
Contributor Author

asukaqaq-s commented Mar 11, 2026

All conflicts resolved. Summary:

  • data.py: Kept both — step_execution field (ours) + is_moe property (main)
  • diffusion_engine.py: Kept both — exec_total_time timing (main) + abort check (ours)
  • scheduler.py: Deleted (refactored into sched/); ported unpack_diffusion_output_shm from main into multiproc_executor.py at both dequeue paths

Note: The inline diffusion path introduced in #1715 bypasses AsyncOmniDiffusion,
so engine.abort() is never called on request cancellation. This breaks step-level
abort for --step-execution mode.

TO Fix: call self._inline_engine.engine.abort(request_id) in _generate_inline's
CancelledError handler. Will push a fix in the next commit.

@asukaqaq-s asukaqaq-s force-pushed the pr/abort-clean branch 2 times, most recently from 89d00f5 to 81ba58a Compare March 11, 2026 17:10
@asukaqaq-s
Copy link
Copy Markdown
Contributor Author

• Fixed inline diffusion request cancellation in AsyncOmni.

After merging origin/main, single-stage diffusion started using the new inline path, which called synchronous OmniDiffusion.generate() through run_in_executor(). That kept the stage-worker RPC optimization, but broke abort behavior: cancelling the asyncio task only cancelled the outer future, not the underlying running thread, and queued inline requests were not cleanly abortable.

The fix keeps inline execution fully in-process, but switches the inline engine to AsyncOmniDiffusion,which already has proper request-state tracking, queued-future cancellation, and forwarding of running-request aborts into DiffusionEngine. I also updated inline cancellation handling to explicitly abort on CancelledError, and changed inline AsyncOmni.abort() to call the async wrapper’s abort() instead of bypassing it and talking directly to the raw engine.

Result: inline diffusion now preserves the RPC-saving fast path while restoring both step-level interruption for running requests and clean cancellation for waiting requests.

@asukaqaq-s
Copy link
Copy Markdown
Contributor Author

Bug fixed: CFG parallel previously deadlocked in step-execution mode because non-rank-0 workers received None from predict_noise_maybe_with_cfg() and skipped the collective broadcast in scheduler_step_maybe_with_cfg(). Fixed by always calling step_scheduler() regardless of noise_pred value.

@asukaqaq-s asukaqaq-s changed the title [Draft][Feat] Support step-boundary abort in diffusion [Feat] Support step-boundary abort in diffusion Mar 12, 2026
@hsliuustc0106 hsliuustc0106 added the high priority high priority issue, needs to be done asap label Mar 12, 2026
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ApsarasX PTAL

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

@wtomin @ZJY0516 @SamitHuang PTAL

@ZJY0516 ZJY0516 requested review from SamitHuang and wtomin March 13, 2026 04:04
@Gaohan123 Gaohan123 removed this from the v0.18.0 milestone Mar 17, 2026
@asukaqaq-s asukaqaq-s force-pushed the pr/abort-clean branch 2 times, most recently from 35b0462 to 25bc9e1 Compare March 22, 2026 14:52
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Mar 22, 2026
asukaqaq-s and others added 5 commits March 24, 2026 07:53
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
…g_reqs and add max_num_seqs to config

aligned with changes introduced in vllm-project#1851

Signed-off-by: jader <yjader@foxmail.com>
@yJader
Copy link
Copy Markdown
Contributor

yJader commented Mar 24, 2026

Merge conflicts with main have been resolved

yJader added 2 commits March 24, 2026 08:34
…reply_rank is None

Signed-off-by: jader <yjader@foxmail.com>
action="store_true",
help="Enable cache-dit summary logging after diffusion forward passes.",
)
omni_config_group.add_argument(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a validation for this arg if it's accidentally triggered in non-diffusion models? or do we expect this will help in ar+dit model as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If this argument is triggered in non-diffusion engine scenarios, it should not have any effect.
  2. For AR+DiT models, I have also discussed this in other community issues. If they adopt the diffusion engine, step-execution will be effective, and I expect it will help as well.

yJader added 2 commits March 30, 2026 03:37
…Scheduler abort

Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: jader <yjader@foxmail.com>
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 1, 2026

Thanks for your great contributions. I have a few comments:

  1. Currently only qwen-image supports step-level execution, but it is an important feature that should be included in docs/user_guide/diffusion_features.md, and please give the feature compatibility information in this doc, too.
  2. In tests/e2e/online_serving/test_qwen_image_expansion.py test script, please add --step-execution test case.

Copy link
Copy Markdown
Collaborator

@wtomin wtomin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wtomin wtomin merged commit 3fd4a4d into vllm-project:main Apr 1, 2026
7 of 8 checks passed
@fake0fan fake0fan mentioned this pull request Apr 2, 2026
5 tasks
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Co-authored-by: jader <yjader@foxmail.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Co-authored-by: jader <yjader@foxmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: asukaqaq-s <1311722138@qq.com>
Co-authored-by: jader <yjader@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority high priority issue, needs to be done asap ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants