-
Notifications
You must be signed in to change notification settings - Fork 1k
reafator pipeline stage/step pipeline #1368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
35d64f4
refactor(diffusion): add stepwise pipeline execution support
asukaqaq-s c44e40b
test(diffusion): cover stepwise pipeline execution
asukaqaq-s a582efb
docs(diffusion): document step execution support
asukaqaq-s 87049c2
fix(diffusion): restore step_execution config compatibility
asukaqaq-s 0ca327d
test(diffusion): use hardware_test marker for distributed tests
asukaqaq-s File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # Diffusion Step Execution | ||
|
|
||
| This guide documents vLLM-Omni's stepwise diffusion contract for model authors | ||
| and contributors implementing `step_execution=True` support for a diffusion | ||
| pipeline. | ||
|
|
||
| For end-user enablement, supported models, and current limitations, see | ||
| [Step Execution](../../user_guide/diffusion/step_execution.md). | ||
|
|
||
| ## Current Support Scope | ||
|
|
||
| `step_execution` is **not** a generic diffusion toggle. It only works for | ||
| pipelines that implement the segmented stateful contract in | ||
| [`vllm_omni/diffusion/models/interface.py`](gh-file:vllm_omni/diffusion/models/interface.py). | ||
|
|
||
| Current in-tree support: | ||
|
|
||
| | Pipeline | Example models | Step execution | | ||
| |----------|----------------|----------------| | ||
| | `QwenImagePipeline` | `Qwen/Qwen-Image`, `Qwen/Qwen-Image-2512` | Yes | | ||
| | All other diffusion pipelines | `QwenImageEditPipeline`, `QwenImageEditPlusPipeline`, `QwenImageLayeredPipeline`, GLM-Image, Wan, Flux, etc. | No | | ||
|
|
||
| Current engine/runtime limitations: | ||
|
|
||
| - `StepScheduler` only schedules `batch_size=1`. | ||
| - `cache_backend` is not supported in step mode. | ||
| - Request-mode extras such as KV transfer are not wired into step mode yet. | ||
| - Unsupported pipelines now fail early during model loading instead of failing on the first request. | ||
|
|
||
| ## Execution Contract | ||
|
|
||
| Step mode is driven by four pipeline methods plus the shared mutable request | ||
| state object: | ||
|
|
||
| - `prepare_encode(state)`: one-time request preparation. | ||
| - `denoise_step(state)`: compute the noise prediction for the current step. | ||
| - `step_scheduler(state, noise_pred)`: mutate latents and advance step state. | ||
| - `post_decode(state)`: decode the final output after denoising is complete. | ||
|
|
||
| The state lives in | ||
| [`vllm_omni/diffusion/worker/utils.py`](gh-file:vllm_omni/diffusion/worker/utils.py) | ||
| as `DiffusionRequestState`. Store request-scoped tensors there, or use | ||
| `state.extra` for model-specific fields that do not justify extending the core | ||
| dataclass. | ||
|
|
||
| The worker-side step loop lives in | ||
| [`vllm_omni/diffusion/worker/diffusion_model_runner.py`](gh-file:vllm_omni/diffusion/worker/diffusion_model_runner.py): | ||
|
|
||
| 1. `prepare_encode()` runs once for a new request. | ||
| 2. `denoise_step()` runs every scheduler tick. | ||
| 3. `step_scheduler()` mutates `state.latents` and advances `state.step_index`. | ||
| 4. `post_decode()` runs exactly once after `state.denoise_completed` becomes true. | ||
|
|
||
| ## Recommended Split | ||
|
|
||
| When converting an existing request-level `forward()` pipeline, keep the split | ||
| strict and mechanical: | ||
|
|
||
| | Request-level phase | Stepwise method | What belongs there | | ||
| |---------------------|-----------------|--------------------| | ||
| | Input validation, prompt encoding, latent init, timestep prep, per-request scheduler creation | `prepare_encode()` | Anything that should happen once per request | | ||
| | Transformer forward / noise prediction | `denoise_step()` | Pure denoise computation for the current timestep | | ||
| | `scheduler.step(...)` and `step_index += 1` | `step_scheduler()` | Only latent/state mutation for one step | | ||
| | VAE decode / postprocess | `post_decode()` | Final decode only | | ||
|
|
||
| Keep the stepwise path reusing the same helpers as the request-level path | ||
| whenever possible. Reimplementing the denoise loop from scratch is the easiest | ||
| way to introduce behavioral drift. | ||
|
|
||
| ## Qwen-Image Reference | ||
|
|
||
| [`pipeline_qwen_image.py`](gh-file:vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py) | ||
| is the reference implementation and is split correctly for the current | ||
| contract: | ||
|
|
||
| - `prepare_encode()` reuses `_prepare_generation_context()` so prompt encoding, | ||
| latent init, timestep creation, CFG setup, and shape bookkeeping stay aligned | ||
| with `forward()`. | ||
| - `prepare_encode()` deep-copies `self.scheduler` **after** | ||
| `prepare_timesteps()` so request-specific scheduler state is isolated. | ||
| - `denoise_step()` reuses `_build_denoise_kwargs()` plus | ||
| `predict_noise_maybe_with_cfg()`, so sequential CFG, CFG-parallel, and | ||
| non-CFG behavior stay identical to the request-level path. | ||
| - `step_scheduler()` only calls | ||
| `scheduler_step_maybe_with_cfg(..., per_request_scheduler=state.scheduler)` | ||
| and increments `state.step_index`. | ||
| - `post_decode()` reuses `_decode_latents()`, so the final image decode matches | ||
| the normal `forward()` path. | ||
|
|
||
| That decomposition is the target pattern for future models. | ||
|
|
||
| ## Rules For New Pipelines | ||
|
|
||
| - Do not keep request-scoped scheduler state on `self.scheduler`. Copy it into | ||
| `state.scheduler` during `prepare_encode()`. | ||
| - Do not mutate `state.step_index` inside `denoise_step()`. Only | ||
| `step_scheduler()` should advance the step. | ||
| - Do not decode partial outputs in `denoise_step()` or `step_scheduler()`. | ||
| - If the request-level pipeline has condition latents, masks, or edit-specific | ||
| tensors, store them in `state` or `state.extra`, not in global pipeline | ||
| attributes. | ||
| - Preserve CFG behavior by sharing the same helper path used by `forward()`. | ||
| - Keep `post_decode()` equivalent to the tail of `forward()`. | ||
|
|
||
| ## Validation Checklist | ||
|
|
||
| Before marking a pipeline as `supports_step_execution = True`, verify: | ||
|
|
||
| - Stepwise output matches request-level output for the same seed and sampling params. | ||
| - Per-request scheduler state is isolated across concurrent requests. | ||
| - Abort during denoise does not leak cached state. | ||
| - `step_index` reported by `RunnerOutput` matches the scheduler progress. | ||
| - CFG-parallel and non-CFG paths both work if the request-level pipeline supports them. | ||
|
|
||
| ## Related Files | ||
|
|
||
| - Contract: [`vllm_omni/diffusion/models/interface.py`](gh-file:vllm_omni/diffusion/models/interface.py) | ||
| - State: [`vllm_omni/diffusion/worker/utils.py`](gh-file:vllm_omni/diffusion/worker/utils.py) | ||
| - Runner loop: [`vllm_omni/diffusion/worker/diffusion_model_runner.py`](gh-file:vllm_omni/diffusion/worker/diffusion_model_runner.py) | ||
| - Scheduler transport: [`vllm_omni/diffusion/sched/interface.py`](gh-file:vllm_omni/diffusion/sched/interface.py) | ||
| - Reference pipeline: [`vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py`](gh-file:vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Step Execution | ||
|
|
||
| Step execution is an opt-in diffusion execution mode enabled with | ||
| `step_execution=True` when constructing `Omni`. | ||
|
|
||
| It is not a generic diffusion toggle for every pipeline. Only pipelines that | ||
| implement the stepwise contract support it today. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```python | ||
| from vllm_omni import Omni | ||
| from vllm_omni.inputs.data import OmniDiffusionSamplingParams | ||
|
|
||
| omni = Omni( | ||
| model="Qwen/Qwen-Image", | ||
| step_execution=True, | ||
| ) | ||
|
|
||
| outputs = omni.generate( | ||
| "A cat sitting on a windowsill", | ||
| OmniDiffusionSamplingParams( | ||
| num_inference_steps=50, | ||
| ), | ||
| ) | ||
| ``` | ||
|
|
||
| ## Supported Pipelines | ||
|
|
||
| | Pipeline | Example models | Step execution | | ||
| |----------|----------------|----------------| | ||
| | `QwenImagePipeline` | `Qwen/Qwen-Image`, `Qwen/Qwen-Image-2512` | Yes | | ||
| | All other diffusion pipelines | `QwenImageEditPipeline`, `QwenImageEditPlusPipeline`, `QwenImageLayeredPipeline`, GLM-Image, Wan, Flux, etc. | No | | ||
|
|
||
| ## Current Limitations | ||
|
|
||
| - `step_execution` currently supports `batch_size=1` only. | ||
| - `cache_backend` is not supported together with step execution. | ||
| - Unsupported pipelines fail early during model loading. | ||
| - Request-mode extras such as KV transfer are not wired into step mode yet. | ||
|
|
||
| ## When To Use It | ||
|
|
||
| Use step execution only when you specifically need the pipeline to run through | ||
| its stepwise request state machine. For normal diffusion inference, leave it | ||
| disabled unless your workflow depends on this mode. | ||
|
|
||
| If you are looking for general diffusion speedups, see | ||
| [Diffusion Acceleration Overview](../diffusion_acceleration.md). | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| If model loading fails with a message mentioning `prepare_encode()`, | ||
| `denoise_step()`, `step_scheduler()`, and `post_decode()`, the selected | ||
| pipeline does not support step execution. | ||
|
|
||
| ## For Model Authors | ||
|
|
||
| If you want to add step execution support to a new diffusion pipeline, see the | ||
| implementation guide: | ||
| [Diffusion Step Execution Design](../../design/feature/diffusion_step_execution.md). |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.