reafator pipeline stage/step pipeline by asukaqaq-s · Pull Request #1368 · vllm-project/vllm-omni

asukaqaq-s · 2026-02-13T08:14:01Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Introduce step-level execution capability at the runner/pipeline layer:
prepare_encode → denoise_step × N → step_scheduler × N → post_decode
QwenImagePipeline will be the first implementation.

Out of scope (no changes):Engine、Executor、Worker entrypoint、External APIs

This PR is strictly limited to the runner/pipeline layer and maintains full backward compatibility.
relativate RFC: #874

Test Plan

Test Result

Prompts used:

"a cup of coffee on a wooden table, morning light, photorealistic"
"a red panda sitting on a tree branch in a bamboo forest, soft focus background"
"an astronaut riding a horse on the surface of Mars, cinematic lighting"
"a cozy cabin in the snowy mountains at sunset, warm glow from windows"
"a futuristic cityscape with flying cars and neon lights, cyberpunk style"

no preformance degradation

Average generation time across 5 prompts per resolution: Stepwise execution introduces zero measurable overhead (< 0.1% difference, within noise).

bit-for-bit identical output

All 15 image pairs (5 prompts x 3 resolutions) produce identical MD5 checksums between stepwise and non-stepwise modes, confirming that the stepwise refactoring does not alter the generation output in any way.

512x512/prompt_0: IDENTICAL
512x512/prompt_1: IDENTICAL
512x512/prompt_2: IDENTICAL
512x512/prompt_3: IDENTICAL
512x512/prompt_4: IDENTICAL
768x768/prompt_0: IDENTICAL
768x768/prompt_1: IDENTICAL
768x768/prompt_2: IDENTICAL
768x768/prompt_3: IDENTICAL
768x768/prompt_4: IDENTICAL
1024x1024/prompt_0: IDENTICAL
1024x1024/prompt_1: IDENTICAL
1024x1024/prompt_2: IDENTICAL
1024x1024/prompt_3: IDENTICAL
1024x1024/prompt_4: IDENTICAL

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please pasting the results comparison before and after, or e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

lishunyang12

Thanks for the contribution. The step-level decomposition is a solid architectural direction. A few observations and questions inline before merge.

lishunyang12 · 2026-02-21T04:59:41Z

+        prompt_embeds_mask: torch.Tensor | None = None,
+        negative_prompt_embeds: torch.Tensor | None = None,
+        negative_prompt_embeds_mask: torch.Tensor | None = None,
+        attention_kwargs: dict[str, Any] | None = None,


I was wondering about a subtle difference from the existing forward() method. In forward(), the prompt/negative_prompt extraction from req.prompts happens unconditionally:

prompt = [p if isinstance(p, str) else (p.get("prompt") or "") for p in req.prompts] or prompt

But here in prepare_encode, it's guarded by if prompt is None. This means if someone passes an explicit prompt argument alongside a req that also contains prompts, forward() would override with req.prompts while prepare_encode would keep the explicit argument. Could you help me understand whether this divergence is intentional? If not, it might be worth making the behavior identical to forward() to avoid subtle bugs when the two code paths are used side by side.

lishunyang12 · 2026-02-21T04:59:41Z

+        scheduler_override: Any | None = None,
+    ):
+        if prompt is None:
+            prompt = [p if isinstance(p, str) else (p.get("prompt") or "") for p in req.prompts] or prompt


Small nit -- the negative_prompt handling here also diverges from forward() in a similar way to the positive prompt. In forward(), the negative prompt logic runs unconditionally:

if all(isinstance(p, str) or p.get("negative_prompt") is None for p in req.prompts): negative_prompt = None

But here it's guarded by if negative_prompt is None. This means if negative_prompt is explicitly passed as an empty string "" (which is falsy but not None), the behavior would differ between the two code paths. I might be overthinking this, but it seemed worth flagging for consistency.

lishunyang12 · 2026-02-21T04:59:41Z


+    def prepare_encode(
+        self,
+        req: OmniDiffusionRequest,


The return type annotation is missing from prepare_encode. Since this method returns a 15-element tuple that the caller (execute_stepwise) unpacks positionally, I was wondering if it might help maintainability to either:

Add a return type annotation (even a tuple[...]), or

Return a NamedTuple or @dataclass so callers don't need to rely on positional unpacking of 15 values?

Positional unpacking of large tuples can be fragile when someone later adds/removes/reorders a return value. Just a thought -- happy to hear your perspective!

lishunyang12 · 2026-02-21T04:59:41Z

+                    height // self.vae_scale_factor // 2,
+                    width // self.vae_scale_factor // 2,
+                )
+            ]


I noticed that prepare_encode does not call self.prepare_timesteps() (which already exists as a method on the class) but instead inlines the timestep preparation logic directly. The existing forward() method uses self.prepare_timesteps(). Could this lead to divergence if someone later modifies prepare_timesteps()? Would it make sense to reuse that helper here for DRY-ness?

I fixed this in 868e40a. prepare_encode() now reuses self.prepare_timesteps(...) (the same path as forward()), so the timestep/mu logic is centralized and won’t drift.

but to do this，I also updated the cfg_parallel.py scheduler_step* signatures to accept an optional scheduler argument. This allows stepwise execution with per-request scheduler state instead of temporarily rebinding self.scheduler. The change is backward-compatible — scheduler=None falls back to self.scheduler, so existing call sites are unaffected.

looks good now

lishunyang12 · 2026-02-21T04:59:41Z

+        latents: torch.Tensor,
+        img_shapes: list,
+        txt_seq_lens: list[int] | None,
+        negative_txt_seq_lens: list[int] | None,


Good reuse of predict_noise_maybe_with_cfg from the CFGParallelMixin. I noticed one difference though: in diffuse(), the additional_transformer_kwargs (which include return_dict and attention_kwargs) are spread into both positive_kwargs and negative_kwargs via **additional_transformer_kwargs. Here in denoise_step, these are set directly as individual keys. The behavior should be equivalent, but I wanted to confirm -- are the kwargs identical in both cases? Specifically, diffuse() passes attention_kwargs as a nested key via the spread, while here it's set as "attention_kwargs": self.attention_kwargs. They look the same to me, just wanted to double-check.

lishunyang12 · 2026-02-21T04:59:42Z

+            "hidden_states": latent_model_input,
+            "timestep": t_for_model / 1000,
+            "guidance": guidance,
+            "encoder_hidden_states_mask": prompt_embeds_mask,


Handling the scheduler override by temporarily swapping self.scheduler with a try/finally is clean and safe.

One small thing I noticed: the mixin's scheduler_step_maybe_with_cfg presumably calls self.scheduler.step(...) internally. If an exception occurs during that call, the finally block will restore the original scheduler, which is great. But could there be thread-safety concerns if multiple requests are processed concurrently? In that case, temporarily mutating self.scheduler on the instance might cause race conditions. Is the current design single-threaded per pipeline instance? If so, this is perfectly fine.

lishunyang12 · 2026-02-21T04:59:42Z

        with set_forward_context(vllm_config=self.vllm_config, omni_diffusion_config=self.od_config):
            with record_function("pipeline_forward"):
-                output = self.pipeline.forward(req)
+                if isinstance(self.pipeline, SupportsStepExecution):


The dispatch logic here is clean. One minor question: since SupportsStepExecution is a @runtime_checkable Protocol, the isinstance check will verify that the pipeline has the supports_step_execution class variable and the required methods. However, I noticed that the helper function supports_step_execution() from interface.py is not used here. Would it be slightly cleaner to use the helper function instead of the raw isinstance check? That way the logic is in one place:\npython\nfrom vllm_omni.diffusion.models.interface import supports_step_execution\nif supports_step_execution(self.pipeline):\n\nMinor style suggestion -- the current code works fine too.

lishunyang12 · 2026-02-21T04:59:42Z

 from vllm_omni.platforms import current_omni_platform

 logger = init_logger(__name__)



Very minor: it looks like an extra blank line was introduced here (there are now two blank lines between the logger assignment and the class definition, where there was one before). Not a blocker at all, just flagging in case you want to keep formatting consistent.

Thanks for flagging this. I checked and the spacing is enforced by our lint/CI — top-level classes require two blank lines, so reducing it to one would fail the style check.

I’ll keep it as-is to stay consistent with the formatter.

fair enough

lishunyang12 · 2026-02-21T04:59:42Z

+
+    supports_step_execution: ClassVar[bool] = True
+
+    def prepare_encode(self, req: "OmniDiffusionRequest", **kwargs: Any) -> Any:


Clean protocol design -- intentionally permissive with *args, **kwargs for denoise_step, step_scheduler, and post_decode.\n\nOne thought: prepare_encode takes a concrete OmniDiffusionRequest parameter, which couples the protocol to that specific request type. If future pipelines might use a different request type, would it make sense to loosen this to Any as well (matching the other methods), or is OmniDiffusionRequest intentionally the canonical request type for all diffusion pipelines? Curious about the intent.

lishunyang12 · 2026-02-21T04:59:42Z

+            .to(latents.device, latents.dtype)
+        )
+        latents_std = 1.0 / torch.tensor(self.vae.config.latents_std).view(1, self.vae.config.z_dim, 1, 1, 1).to(
+            latents.device, latents.dtype


In post_decode, when output_type == "latent", the original forward() assigns image = latents and then wraps it in DiffusionOutput(output=image). Here, you return DiffusionOutput(output=latents) directly, which is equivalent.

However, I noticed that forward() sets self._current_timestep = None after the diffuse loop ends, but neither post_decode nor execute_stepwise resets it. This could leave self._current_timestep pointing to the last timestep value after generation completes. Could this cause issues if something inspects current_timestep between requests? It might be worth adding self._current_timestep = None at the end of post_decode or in execute_stepwise to match forward()'s behavior.

Thanks, that’s a great point. You’re right that the stepwise path should mirror forward() and clear self._current_timestep after generation.

I fixed this in the latest commit — self._current_timestep is now reset to None at the end of stepwise decoding, so it won’t retain the last timestep across requests.

DiffusionOutput(output=latents) is behaviorally unchanged compared to assigning image = latents first.

hsliuustc0106 · 2026-02-24T07:07:27Z

@vllm-omni-reviewer

github-actions · 2026-02-24T07:16:29Z

🤖 VLLM-Omni PR Review

Code Review: Refactor Pipeline Stage/Step Pipeline

1. Overview

This PR introduces step-level execution capability at the runner/pipeline layer for diffusion models, implementing a new execution flow: prepare_encode → denoise_step × N → step_scheduler × N → post_decode. The changes are well-scoped to the runner/pipeline layer and maintain backward compatibility through protocol-based detection.

Overall Assessment: Positive with suggestions - The architectural approach is sound, but there are several areas that need attention before merging.

2. Code Quality

Strengths

Good use of TYPE_CHECKING to avoid circular imports
Protocol-based design allows for clean duck typing
Methods have helpful docstrings

Issues

a) Fragile tuple unpacking in execute_stepwise (diffusion_model_runner.py:203-208)

The 14-element tuple returned from prepare_encode is fragile and error-prone. If the return order changes, it will break silently.

(
    prompt_embeds, prompt_embeds_mask,
    negative_prompt_embeds, negative_prompt_embeds_mask,
    latents, img_shapes, txt_seq_lens, negative_txt_seq_lens,
    timesteps, do_true_cfg, guidance, true_cfg_scale,
    height, width, scheduler,
) = self.pipeline.prepare_encode(req=req)

Recommendation: Use a dataclass or NamedTuple for the return value:

@dataclass
class StepExecutionContext:
    prompt_embeds: torch.Tensor
    prompt_embeds_mask: torch.Tensor
    negative_prompt_embeds: torch.Tensor | None
    negative_prompt_embeds_mask: torch.Tensor | None
    latents: torch.Tensor
    img_shapes: list
    txt_seq_lens: list[int] | None
    negative_txt_seq_lens: list[int] | None
    timesteps: torch.Tensor
    do_true_cfg: bool
    guidance: torch.Tensor | None
    true_cfg_scale: float
    height: int
    width: int
    scheduler: Any

b) Missing return type annotation (pipeline_qwen_image.py:536)

prepare_encode lacks a return type annotation, making it difficult to understand the expected output.

c) Interrupt handling incomplete (diffusion_model_runner.py:211-218)

denoise_step can return None when interrupted, but the loop continues without checking:

for _i, t in enumerate(timesteps):
    noise_pred = self.pipeline.denoise_step(...)  # Can return None
    latents = self.pipeline.step_scheduler(...)   # Uses None?

Recommendation:

for _i, t in enumerate(timesteps):
    noise_pred = self.pipeline.denoise_step(...)
    if noise_pred is None:
        break  # Handle interrupt
    latents = self.pipeline.step_scheduler(...)

3. Architecture & Design

Strengths

Clean separation of concerns with distinct methods for each phase
Backward compatibility maintained via isinstance check
Protocol pattern allows gradual adoption by other pipelines

Issues

a) Scheduler state mutation (pipeline_qwen_image.py:766-776)

The temporary binding of self.scheduler in step_scheduler modifies instance state, which could cause issues in concurrent scenarios:

if scheduler is not None and scheduler is not self.scheduler:
    saved = self.scheduler
    self.scheduler = scheduler
    try:
        return self.scheduler_step_maybe_with_cfg(...)
    finally:
        self.scheduler = saved

Recommendation: Pass the scheduler explicitly to scheduler_step_maybe_with_cfg or refactor the mixin to accept scheduler as a parameter.

b) Parameter redundancy in prepare_encode (pipeline_qwen_image.py:536-564)

The method accepts both req and individual parameters that overlap with req attributes. This creates ambiguity about which takes precedence.

Recommendation: Consider either:

Only accepting req and extracting values internally
Making individual parameters override req values consistently (document this behavior)

c) Missing protocol enforcement (interface.py:31-47)

The SupportsStepExecution protocol methods have no actual signature enforcement due to the permissive *args, **kwargs pattern. While documented as intentional, this reduces type safety.

4. Security & Safety

Issues

a) No input validation in execute_stepwise (diffusion_model_runner.py:203)

No validation that prepare_encode returned valid data before proceeding to denoise loop.

b) Resource cleanup

The scheduler state mutation pattern uses try/finally correctly, but consider what happens if an exception occurs mid-mutation in a multi-threaded context.

5. Testing & Documentation

Issues

a) Duplicate test command in PR description

The same test command appears twice in the PR description - appears to be a copy-paste error.

b) Missing test coverage

No tests for the new SupportsStepExecution protocol
No tests for execute_stepwise method
No tests for interrupt handling during step execution
No tests for scheduler override functionality

c) Missing documentation for new protocol

The SupportsStepExecution protocol lacks documentation on:

Expected call sequence
State management between calls
Thread-safety considerations

6. Specific Suggestions

`vllm_omni/diffusion/models/interface.py`

Line 31-47: Add more detailed docstrings explaining the contract:

@runtime_checkable
class SupportsStepExecution(Protocol):
    """Step-level execution protocol for diffusion pipelines.
    
    Implementations must support the following call sequence:
    1. prepare_encode() - called once before the denoise loop
    2. denoise_step() + step_scheduler() - called N times in sequence
    3. post_decode() - called once after the denoise loop
    
    State between calls should be managed via instance attributes
    or returned values from prepare_encode().
    """

`vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py`

Line 536: Add return type annotation:

def prepare_encode(
    self,
    req: OmniDiffusionRequest,
    ...
) -> tuple[...]:  # Or return StepExecutionContext

Line 766-776: Consider refactoring to avoid state mutation:

def step_scheduler(
    self,
    noise_pred: torch.Tensor | None,
    timestep: torch.Tensor | float | int,
    latents: torch.Tensor,
    do_true_cfg: bool = False,
    scheduler: Any | None = None,
) -> torch.Tensor:
    if self.interrupt:
        return latents
    
    active_scheduler = scheduler if scheduler is not None else self.scheduler
    # Pass active_scheduler to a refactored scheduler_step_maybe_with_cfg
    return self._scheduler_step_with_scheduler(
        active_scheduler, noise_pred, timestep, latents, do_true_cfg
    )

`vllm_omni/diffusion/worker/diffusion_model_runner.py`

Line 203-220: Add interrupt handling and use a context object:

def execute_stepwise(self, req: OmniDiffusionRequest) -> DiffusionOutput:
    """Execute via step-level Protocol: prepare_encode → denoise × N → post_decode."""
    ctx = self.pipeline.prepare_encode(req=req)
    
    for _i, t in enumerate(ctx.timesteps):
        noise_pred = self.pipeline.denoise_step(
            ctx.prompt_embeds, ctx.prompt_embeds_mask,
            ctx.negative_prompt_embeds, ctx.negative_prompt_embeds_mask,
            ctx.latents, ctx.img_shapes, ctx.txt_seq_lens, ctx.negative_txt_seq_lens,
            t, ctx.do_true_cfg, ctx.guidance, ctx.true_cfg_scale,
        )
        if noise_pred is None:
            logger.info("Denoise interrupted at step %d", _i)
            break
        ctx.latents = self.pipeline.step_scheduler(
            noise_pred, t, ctx.latents, ctx.do_true_cfg, ctx.scheduler,
        )
    
    return self.pipeline.post_decode(ctx.latents, ctx.height, ctx.width)

Line 37: Remove extra blank line added between imports and class.

7. Approval Status

Changes Requested

While the architectural approach is sound and the PR achieves its stated goals, I recommend addressing the following before merging:

Critical: Fix interrupt handling in execute_stepwise - currently broken
Important: Replace 14-element tuple with a named structure for maintainability
Important: Add return type annotation to prepare_encode
Minor: Fix duplicate test command in PR description
Minor: Remove extra blank line in diffusion_model_runner.py

The backward compatibility approach is excellent, and the protocol-based design will make it easy for other pipelines to adopt step-level execution in the future. Once the issues above are addressed, this will be a solid addition to the codebase.

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

asukaqaq-s · 2026-03-01T09:34:53Z

Thanks for the detailed review. I’ve pushed an update addressing the concerns raised and included additional self-review fixes.
Main changes:

Replaced the 15-element positional tuple with DiffusionRequestState (dataclass).

Step interfaces are now state-driven: prepare_encode, denoise_step, step_scheduler, and post_decode.
The runner constructs and manages the state lifecycle, removing fragile positional unpacking.
The Protocol is now decoupled from direct OmniDiffusionRequest method signatures.

Removed temporary self.scheduler swapping.

Added an explicit scheduler: Any | None = None parameter in CFG mixin scheduler helpers.
The step path now passes state.scheduler explicitly.
Backward compatibility is preserved by falling back to self.scheduler when not provided.
Also fixed:
Prompt / negative prompt extraction in prepare_encode now matches forward() behavior (unconditional extraction).
Reused self.prepare_timesteps() to avoid divergence and duplicated logic.
Runner dispatch now uses the supports_step_execution(...) helper.
Added _current_timestep = None reset in post_decode for parity with forward().
Fixed a cfg_normalize regression in the stepwise path by restoring forward-equivalent default behavior.
Note on concurrency:
The current runner execution model is synchronous per call. However, we still removed shared self.scheduler mutation to avoid potential race conditions in future continuous batching or concurrent execution models.

And i updated the test results in the PR docs.
plz re-review when you have a chance, thanks! @lishunyang12

asukaqaq-s · 2026-03-03T10:57:51Z

Rebased onto main, resolved the conflicts, and addressed the review comments.

lishunyang12

nice rework — previous concerns addressed. one thing inline

lishunyang12 · 2026-03-09T01:05:08Z

+
+        try:
+            self.pipeline.prepare_encode(state)
+


denoise_step returns None on interrupt but the loop keeps going. Worth breaking early:

Suggested change

for _i, _t in enumerate(state.timesteps):

noise_pred = self.pipeline.denoise_step(state)

if noise_pred is None:

break

# TODO: continuous batching should step per-request state.

self.pipeline.step_scheduler(state, noise_pred)

Good catch, thanks. I applied the early break in this PR so we don’t call 'step_schedulerwithNone. I’m keeping the change minimal for now; a follow-up with proper abort support will make the interrupt/abort path explicit.

Bounty-hunter · 2026-03-10T07:18:27Z

+            "attention_kwargs": self.attention_kwargs,
+            "return_dict": False,
+        }
+        if state.do_true_cfg:


Can we extract a common function to avoid this massive duplicate code?

Makes sense. I’ll refactor this by adding a private helper like _build_denoise_kwargs(...) inside qwen-image to build positive_kwargs, negative_kwargs, and output_slice, rather than pushing it into the generic CFGParallelMixin.

not only cfg. prepare_encode also have duplicate code.

Bounty-hunter · 2026-03-10T07:55:31Z

+        noise_pred: torch.Tensor,
+        t: torch.Tensor,
+        latents: torch.Tensor,
+        scheduler: Any | None = None,


Why we need to add this param, state.scheduler = self.scheduler in prepare_encode

We need this because with step-level switching, batches may be at different progress, and a request may be switched between different execution states. So we cache the scheduler in RequestStateCache to make sure the request continues with the correct local scheduling state.

I know we need to keep the scheduling state (e.g., timesteps). However, state.scheduler = self.scheduler is just a reference assignment. When self.scheduler changes, request_state.scheduler will also change. have you test for it?

state.scheduler is used in the later scheduler step here:

state.latents = self.scheduler_step_maybe_with_cfg( noise_pred, t, state.latents, state.do_true_cfg, scheduler=state.scheduler, )

state.scheduler is meant to keep per-request scheduler state, not just timesteps. In stepwise execution and future continuous batching, different requests may be resumed at different denoise progress, so they should not share the same pipeline scheduler instance. self.scheduler is the base template, while state.scheduler is the request-local scheduler used later in scheduler_step_maybe_with_cfg(...).

You're right. Assigning self.scheduler directly only keeps a shared reference. I will change this to create a request-local scheduler instance, e.g. state.scheduler =FlowMatchEulerDiscreteScheduler.from_config(self.scheduler.config), so the later scheduler_step_maybe_with_cfg(..., scheduler=state.scheduler) uses per-request scheduler state.

asukaqaq-s · 2026-03-10T15:24:42Z

Review 1 — Bounty-hunter: "not only cfg. prepare_encode also have duplicate code."

Addressed. Extracted _extract_prompts() and _prepare_generation_context() as shared helpers in pipeline_qwen_image.py. Both forward() and prepare_encode() now delegate to _prepare_generation_context() for input validation, prompt encoding, latent preparation, timestep computation, and guidance setup. Also extracted _build_denoise_kwargs() for the denoise kwargs construction used by denoise_step(), and _decode_latents() shared by forward() and post_decode().

Review 2 — Bounty-hunter: "state.scheduler = self.scheduler is just a reference assignment"

Fixed. prepare_encode() now does copy.deepcopy(self.scheduler) after _prepare_generation_context() (which calls prepare_timesteps() and materializes the timestep state on self.scheduler), so the per-request scheduler carries correct dynamic-shifting state without sharing the pipeline instance.

Refactor from pr/pipeline branch, related to vllm-project#1368. - Restructure pipeline stage/step API - Pass scheduler explicitly in stepwise pipeline and CFG mixin Signed-off-by: asukaqaq-s <1311722138@qq.com> Signed-off-by: asukaqaq <1311722138@qq.com>

Refactor from pr/pipeline branch, related to vllm-project#1368. - Restructure pipeline stage/step API - Pass scheduler explicitly in stepwise pipeline and CFG mixin Signed-off-by: asukaqaq-s <1311722138@qq.com>

wtomin · 2026-03-13T07:46:02Z

+        noise_pred: torch.Tensor,
+        t: torch.Tensor,
+        latents: torch.Tensor,
+        scheduler: Any | None = None,


A better name for this new argument, for example, per_request_scheduler or per_state_scheduler, is needed to differentiate it with self.scheduler. And please provide more detailed docstring for this argument. This new argument takes effect only when step-wise execution is enabled, right? Please run some parameter check.

Maybe in a future PR: a future design document is needed for step-wise execution and continuous batching in diffusion pipelines.

Updated this API for clarity.

wtomin · 2026-03-18T08:04:04Z

+
+
+## List of Supported Models for Step-Execution
+


I find the Step-Execution section a bit odd here.

I have another PR for docs #1928. Can you check docs/user_guide/diffusion_features.md. I think it suits better to be in this doc, and works as a feature for diffusion models.

Thanks, agreed. I removed the Step-Execution section from docs/models/supported_models.md. For now, the user-facing content is documented in docs/user_guide/diffusion/step_execution.md. After docs PR #1928 is merged, I can rebase and further align it with docs/user_guide/diffusion_features.md if needed.

wtomin · 2026-03-18T08:25:15Z

@asukaqaq-s There is one bug refactor PR #1908 that will be merged in just a few days. After it's merged, I think you PR needs to be rebased and verify its functionality again.

Signed-off-by: asukaqaq-s <1311722138@qq.com>

Move the step-execution docs into the diffusion feature docs structure, add a user-facing step execution page, and remove the feature-specific section from supported models. Signed-off-by: asukaqaq-s <1311722138@qq.com>

Signed-off-by: asukaqaq-s <1311722138@qq.com>

asukaqaq-s · 2026-03-19T15:41:43Z

I resolved the merge conflicts introduced by the rebase, and also fixed the earlier CI failure caused by the missing step_execution field in OmniDiffusionConfig.

wtomin

Since all of my comments were addressed, I will approve this PR.

wtomin · 2026-03-20T05:58:21Z

+    if current_omni_platform.get_device_count() < world_size:
+        pytest.skip(f"Test requires {world_size} devices")
+
+    torch.multiprocessing.spawn(


@yenuo26 @congw729 Can you check if we should include this test script in CI?
Currently this test script requries the minimal 2 GPU devices. It runs simple mocked tests cases, thus it doesn't download or run large-scale diffusion models.

@asukaqaq-s How long does it take to run this test script on your local machine?

Around 5~10s for each parallel test on my local machine. The other ones finish almost immediately.

https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_markers/#example-usage-for-markers
parallel means this is parallel feature related, for your test:

from tests.utils import hardware_test @hardware_test( res={"cuda": "L4"}, num_cards=2, )

congw729 · 2026-03-20T08:02:45Z

May I know the total time cost for your test file? I think maybe add one extra test step in CI test is suitable for this test @wtomin @yenuo26. Maybe place this test in test-ready.yml if it runs very fast?

asukaqaq-s · 2026-03-20T09:52:33Z

May I know the total time cost for your test file? I think maybe add one extra test step in CI test is suitable for this test @wtomin @yenuo26. Maybe place this test in test-ready.yml if it runs very fast?

Thanks! I've replaced @pytest.mark.parallel with @hardware_test(res={"cuda": "L4"}, num_cards=2) on the three distributed tests.

Test durations with --durations=0 (2x L4):

test_execute_stepwise_with_ulysses_parallel: 10.84s
test_execute_stepwise_with_ring_parallel: 11.97s
test_execute_stepwise_with_cfg_parallel: 11.98s
CPU tests: < 1s each
Total: ~36s for all 10 tests.

Signed-off-by: asukaqaq-s <1311722138@qq.com>

asukaqaq-s requested a review from hsliuustc0106 as a code owner February 13, 2026 08:14

asukaqaq-s force-pushed the pr/pipeline branch from 9deb4cf to d961085 Compare February 13, 2026 08:21

asukaqaq-s changed the title ~~reafator pipeline stage/step api~~ reafator pipeline stage/step pipeline Feb 13, 2026

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

yJader mentioned this pull request Mar 3, 2026

[Refactor] Refactor Diffusion Scheduler/Executor Boundaries and Request State Flow #1625

Merged

asukaqaq-s force-pushed the pr/pipeline branch 3 times, most recently from 868e40a to fcef0bb Compare March 3, 2026 10:03

Bounty-hunter mentioned this pull request Mar 5, 2026

[Feature]: Abort request when http disconnects JiusiServe/LM-service#75

Closed

1 task

lishunyang12 reviewed Mar 9, 2026

View reviewed changes

asukaqaq-s force-pushed the pr/pipeline branch from fcef0bb to c182dc9 Compare March 9, 2026 03:04

asukaqaq-s mentioned this pull request Mar 10, 2026

[Feat] Support step-boundary abort in diffusion #1769

Merged

10 tasks

Bounty-hunter reviewed Mar 10, 2026

View reviewed changes

david6666666 mentioned this pull request Mar 11, 2026

[RFC]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#167

Closed

26 tasks

wtomin reviewed Mar 11, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

wtomin mentioned this pull request Mar 12, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

54 tasks

wtomin reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/interface.py

wtomin reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/worker/utils.py

wtomin reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

asukaqaq-s force-pushed the pr/pipeline branch 2 times, most recently from ccb79fd to ff49bde Compare March 15, 2026 16:54

wtomin reviewed Mar 18, 2026

View reviewed changes

Comment thread docs/design/feature/diffusion_step_execution.md

wtomin reviewed Mar 18, 2026

View reviewed changes

asukaqaq-s force-pushed the pr/pipeline branch 3 times, most recently from 164f6dd to 7a16a57 Compare March 18, 2026 16:57

Gaohan123 added the ready label to trigger buildkite CI label Mar 19, 2026

asukaqaq-s added 3 commits March 19, 2026 15:18

refactor(diffusion): add stepwise pipeline execution support

35d64f4

Signed-off-by: asukaqaq-s <1311722138@qq.com>

test(diffusion): cover stepwise pipeline execution

c44e40b

Signed-off-by: asukaqaq-s <1311722138@qq.com>

docs(diffusion): document step execution support

a582efb

Move the step-execution docs into the diffusion feature docs structure, add a user-facing step execution page, and remove the feature-specific section from supported models. Signed-off-by: asukaqaq-s <1311722138@qq.com>

asukaqaq-s force-pushed the pr/pipeline branch from 7a16a57 to 854e9e9 Compare March 19, 2026 15:30

fix(diffusion): restore step_execution config compatibility

87049c2

Signed-off-by: asukaqaq-s <1311722138@qq.com>

asukaqaq-s force-pushed the pr/pipeline branch from 854e9e9 to 87049c2 Compare March 19, 2026 15:39

wtomin approved these changes Mar 20, 2026

View reviewed changes

wtomin reviewed Mar 20, 2026

View reviewed changes

asukaqaq-s force-pushed the pr/pipeline branch from 73634df to d6f0a84 Compare March 20, 2026 09:55

test(diffusion): use hardware_test marker for distributed tests

0ca327d

Signed-off-by: asukaqaq-s <1311722138@qq.com>

asukaqaq-s force-pushed the pr/pipeline branch from d6f0a84 to 0ca327d Compare March 20, 2026 10:01

wtomin merged commit da7a8f8 into vllm-project:main Mar 20, 2026
8 checks passed

TKONIY mentioned this pull request Mar 22, 2026

[Diffusion] Refactor LTX2 to use unified CFG parallel framework and enable per_request_scheduler #2078

Closed

asukaqaq-s mentioned this pull request Mar 23, 2026

[Feature]: Support VAE as a Separate Stage to Reduce GPU Memory Pressure in Diffusion Pipelines #2089

Open

1 task

alex-jw-brooks mentioned this pull request Mar 25, 2026

[RFC]: Add Diffusion Pipeline Protocol / Base Class #2189

Open

1 task

wtomin mentioned this pull request Mar 26, 2026

[RFC]: vLLM-Omni Diffusion Module — Q2 2026 Roadmap #2226

Open

25 tasks

TKONIY mentioned this pull request Mar 26, 2026

[Diffusion] Refactor LTX2 to use unified CFG parallel framework #2160

Merged

zzhang-fr mentioned this pull request Mar 27, 2026

[RFC]: Pipeline Parallelism & Stream Batch for Real-Time Video Generation #2280

Open

16 tasks

asukaqaq-s mentioned this pull request May 7, 2026

[RFC]: Refactor engine/runner/pipeline to support step-wise and continuos batching #874

Open

5 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

reafator pipeline stage/step pipeline (vllm-project#1368)

2055683

Signed-off-by: asukaqaq-s <1311722138@qq.com>

		from vllm_omni.platforms import current_omni_platform

		logger = init_logger(__name__)


		supports_step_execution: ClassVar[bool] = True

		def prepare_encode(self, req: "OmniDiffusionRequest", **kwargs: Any) -> Any:

+            for _i, _t in enumerate(state.timesteps):
+                noise_pred = self.pipeline.denoise_step(state)
+                if noise_pred is None:
+                    break
+                # TODO: continuous batching should step per-request state.
+                self.pipeline.step_scheduler(state, noise_pred)

Conversation

asukaqaq-s commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

no preformance degradation

bit-for-bit identical output

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

github-actions Bot commented Feb 24, 2026

🤖 VLLM-Omni PR Review

Code Review: Refactor Pipeline Stage/Step Pipeline

1. Overview

2. Code Quality

Strengths

Issues

3. Architecture & Design

Strengths

Issues

4. Security & Safety

Issues

5. Testing & Documentation

Issues

6. Specific Suggestions

vllm_omni/diffusion/models/interface.py

vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

vllm_omni/diffusion/worker/diffusion_model_runner.py

7. Approval Status

Changes Requested

Uh oh!

asukaqaq-s commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

asukaqaq-s commented Feb 13, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

`vllm_omni/diffusion/models/interface.py`

`vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py`

`vllm_omni/diffusion/worker/diffusion_model_runner.py`

asukaqaq-s commented Mar 1, 2026 •

edited

Loading

wtomin Mar 13, 2026 •

edited

Loading