Skip to content

[Chore]: refactor out unused/redundant params in diffusion pipelines#1235

Open
fhfuih wants to merge 1 commit into
vllm-project:mainfrom
fhfuih:refactor-pipeline-forward
Open

[Chore]: refactor out unused/redundant params in diffusion pipelines#1235
fhfuih wants to merge 1 commit into
vllm-project:mainfrom
fhfuih:refactor-pipeline-forward

Conversation

@fhfuih
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih commented Feb 6, 2026

Purpose

As is discussed before in #797
many diffusion pipelines have several extra parameters defined in forward function. They have never been used---the forward function has always been called with only one OmniDiffusionRequest object (even before PR 797).

This PR does this refactor. In particular, it also aims to address this discussion: #1196
and ships in companion with #1196---so that the "how to add a new model" documentation teaches developers to follow the correct paradigm.

Test Plan

No new features are added, no logic is changed. Will just run existing tests

Test Result

To be updated

Additional notes

In this refactor

  • If a param is NOT present in OmniDiffusionRequest, I'd rather keep them in the function signature. They are never changed by the user (because when calling forward, only req is passed). Only their default values are used
  • Otherwise, if the param's default value in the function signature is NOT NONE, I copy the default value in the function body when reading it. Using or instead of if .. is not None, the "default" values are applied when the user-passed values are None or 0. Please see my argument below why this is acceptable.
  • If the param's default value in the function signature is NONE, I simply read the value from OmniDiffusionRequest. No fallback default value is used.

Why it is acceptable to apply alternative default values when the user passes 0:

  1. Some params defined in OmniDiffusionRequest are not None anyway. This old logic is originally problematic: condition never satisfied.
  2. 0 or 0.0 in these values are meaningless (num steps, guidance, etc). Their default values in OmniDiffusionRequest are not 0 either. So it is only possible when the user explicitly makes it 0. In this case, it makes sense to inject other default values in this case: that's the user's intention.
  3. I notice that every pipeline has different preferred default values. So I decide to keep them here for documentation purposes.
  4. I add default values if and only if it is present in the forward function signature. It means the pipeline authors are intended to adopt these parameters (but could not in the current codebase). So it makes sense to use these values when the user-passed value is invalid on purpose (i.e., explicit 0)

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@fhfuih fhfuih requested a review from hsliuustc0106 as a code owner February 6, 2026 01:32
Copilot AI review requested due to automatic review settings February 6, 2026 01:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,

P1 Badge Initialize prompt/negative embeddings before use

When req.prompts contains normal strings and no embeddings (the typical case), prompt_embeds and negative_prompt_embeds are only assigned inside the if any(...) blocks above, so they remain unbound and the subsequent check_inputs/encode_prompt usage raises UnboundLocalError before any generation. Previously these were defaulted to None via the function signature, so this is a regression. Initialize both variables to None before the conditional (the same pattern appears in longcat_image, ovis_image, qwen_image, sd3, stable_audio, and z_image pipelines).

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the forward methods in diffusion pipeline implementations to remove unused and redundant parameters. As discussed in PR #797 and #1196, these parameters were never used in practice since the forward function is always called with only an OmniDiffusionRequest object. This cleanup makes the API clearer and teaches developers the correct paradigm for adding new models.

Changes:

  • Removed unused function parameters from forward methods (prompt, height, width, num_inference_steps, guidance_scale, generator, latents, prompt_embeds, negative_prompt_embeds, etc.)
  • Consolidated parameter extraction to use only req.sampling_params and req.prompts
  • Added explicit extraction logic for prompt_embeds and negative_prompt_embeds from request prompts
  • Standardized default value fallback patterns across pipelines

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
vllm_omni/diffusion/models/z_image/pipeline_z_image.py Removed 11 unused parameters; consolidated to extract all values from req object
vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py Removed 9 unused parameters; added explicit prompt_embeds extraction
vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py Removed 9 unused parameters; added explicit prompt_embeds extraction
vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py Removed 9 unused parameters; added explicit prompt_embeds extraction
vllm_omni/diffusion/models/stable_audio/pipeline_stable_audio.py Removed 10 unused parameters; added prompt_embeds extraction logic
vllm_omni/diffusion/models/sd3/pipeline_sd3.py Removed 10 unused parameters; consolidated parameter extraction from req
vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_layered.py Removed 13 unused parameters; added image extraction from multi_modal_data
vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py Removed 13 unused parameters; added image extraction from multi_modal_data
vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py Removed 13 unused parameters; added image extraction from multi_modal_data
vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py Removed 12 unused parameters; consolidated all extraction from req
vllm_omni/diffusion/models/ovis_image/pipeline_ovis_image.py Removed 12 unused parameters; added prompt_embeds extraction logic
vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image_edit.py Removed 9 unused parameters; added image extraction from multi_modal_data
vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py Removed 11 unused parameters; reorganized parameter extraction
vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py Removed 11 unused parameters; simplified prompt and image extraction
vllm_omni/diffusion/models/flux/pipeline_flux.py Removed 11 unused parameters; added detailed prompt_embeds extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

image = [PIL.Image.open(im) if isinstance(im, str) else cast(PIL.Image.Image, im) for im in raw_image]
else:
image = PIL.Image.open(raw_image) if isinstance(raw_image, str) else cast(PIL.Image.Image, raw_image)

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If raw_image is None (line 637), then image will be set to None (line 638). However, on line 644, the code attempts to access image[0].size or image.size, which will raise an AttributeError if image is None. This code path should either handle the None case or ensure that image is always set to a valid value before reaching line 644.

Suggested change
if image is None:
raise ValueError(
"No image was provided in 'multi_modal_data' for fallback preprocessing; "
"an image is required to compute target dimensions."
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, many pipelines have strange type annotation that mismatches later type checks. The are confusing and originally conflicting. I am not to fix everything in this PR.

Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py
)
sigmas = req.sampling_params.sigmas
max_sequence_length = req.sampling_params.max_sequence_length or 512
guidance_scale = req.sampling_params.guidance_scale if req.sampling_params.guidance_rescale is not None else 5.0
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition should check guidance_scale_provided instead of guidance_rescale. This is inconsistent with all other pipelines which use guidance_scale_provided to determine if the user explicitly provided a guidance scale. The current condition checks guidance_rescale (a different parameter), which will likely always evaluate to not None since it has a default value of 0.0, causing the guidance_scale logic to behave incorrectly.

Suggested change
guidance_scale = req.sampling_params.guidance_scale if req.sampling_params.guidance_rescale is not None else 5.0
guidance_scale = (
req.sampling_params.guidance_scale
if req.sampling_params.guidance_scale_provided
else 5.0
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is already there. I don't want to break things.

Comment thread vllm_omni/diffusion/models/z_image/pipeline_z_image.py Outdated
Comment thread vllm_omni/diffusion/models/sd3/pipeline_sd3.py Outdated
Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py Outdated
Comment thread vllm_omni/diffusion/models/stable_audio/pipeline_stable_audio.py Outdated
Comment thread vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py Outdated
Comment thread vllm_omni/diffusion/models/flux/pipeline_flux.py Outdated
Comment thread vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_omni/diffusion/models/z_image/pipeline_z_image.py Outdated
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@wtomin @SamitHuang PTAL

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

do we also need to change the relevant api in docs?

self,
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
prompt_2: str | list[str] | None = None,
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's confusing to keep a part of them. For clarity, we should either parse all arguments for pipeline forward via req, or keep all of them for consistency with diffusers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Now all the extra params are read from req.sampling_params.extra_args (same as vllm behavior and already used by SD audio). Default values are preserved. See if you feel this new version is better 😸

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Feb 16, 2026

do we also need to change the relevant api in docs?

For breaking changes: There are no user-level breaking changes.
I have previously discussed with @wtomin to ensure that it is consistent with the recent doc update in #1193. It is only related to Developer Doc (when adding a new diffusion model). It should be user-agnostic and not changing external API.

  • There is a recently added doc page "custom_pipelines". I have updated it and posted a comment below

For feature addition: Indeed, these parameters are previously never used by upstream callers, so only their default values are used. Now, moving them to sampling_params.extra_args, users are able to override their values. But I am not sure if these parameters are intended to be used by users.


If you think user doc is desired, I can add it as well. @SamitHuang maybe also asking your thoughts.

@fhfuih fhfuih force-pushed the refactor-pipeline-forward branch from de49863 to 2b1b00d Compare February 16, 2026 03:51
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knlnguyen1802 I have updated this file and custom_pipeline.md to match the usage of forward function after #797 and the cleanup refactoring in this PR. See if that looks good to you

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhfuih It is fine for me if the example can run successfully

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Feb 16, 2026

@hsliuustc0106 @SamitHuang Could you help mark this PR as ready? I rebased onto main branch and updated relevant docs and new Huanyuan model.

@SamitHuang SamitHuang added the ready label to trigger buildkite CI label Feb 23, 2026
attention_kwargs: dict | None = None,
**kwargs,
) -> DiffusionOutput:
def forward(self, req: OmniDiffusionRequest) -> DiffusionOutput:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check the generated video are unchanged after refactor? It's possible that the offline / online inference examples may use one of these remove hyper-parameters

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that the same video is generated using wan t2v, and their intermediate latents in all timestamps are all the same. Used offline text_to_video example script with custom sampling parameters like this

python examples/offline_inference/text_to_video/text_to_video.py \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
  --negative-prompt "orange cat" \
  --height 480 \
  --width 480 \
  --num-frames 41 \
  --guidance-scale 7.0 \
  --guidance-scale-high 3.0 \
  --flow-shift 12.0 \
  --num-inference-steps 40 \
  --fps 12 \
  --output t2v_out.mp4 --model /home/models/Wan-AI/Wan2.2-T2V-A14B-Diffusers --seed 42
t2v_out_3.mp4
t2v_out_1.mp4

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@github-actions
Copy link
Copy Markdown

🤖 VLLM-Omni PR Review

Code Review: Refactor Unused/Redundant Params in Diffusion Pipelines

1. Overview

This PR refactors the forward method signatures across multiple diffusion pipelines by removing unused/redundant parameters. Since forward is always called with only an OmniDiffusionRequest object, the previous function signatures with many individual parameters were misleading and never actually used.

Overall Assessment: Positive - This is a well-structured cleanup PR that improves code clarity and reduces confusion for developers adding new models.

2. Code Quality

Strengths

  • Consistent refactoring pattern applied across all pipelines
  • Documentation and examples updated to match the new paradigm
  • Clear PR description explaining the rationale and approach

Concerns

2.1 Inconsistent guidance_scale handling

Some pipelines use or fallback while others use guidance_scale_provided:

pipeline_flux.py:609:

guidance_scale = req.sampling_params.guidance_scale or 3.5

pipeline_qwen_image.py:556-559:

if req.sampling_params.guidance_scale_provided:
    guidance_scale = req.sampling_params.guidance_scale
else:
    guidance_scale = 1.0

The guidance_scale_provided approach is more robust because it explicitly tracks whether the user provided a value. Using or would incorrectly apply the default if the user explicitly sets guidance_scale=0.0.

Recommendation: Standardize on guidance_scale_provided across all pipelines, or document why the or approach is acceptable for specific pipelines.

2.2 Potential issue with or fallback for integer parameters

pipeline_flux.py:607:

num_inference_steps = req.sampling_params.num_inference_steps or 28

While the PR description argues that 0 is meaningless for these parameters, this could cause subtle bugs if someone intentionally passes 0 for testing or edge cases. Consider using explicit is not None checks for clarity:

num_inference_steps = req.sampling_params.num_inference_steps if req.sampling_params.num_inference_steps is not None else 28

2.3 Duplicated image extraction logic

The image extraction from multi_modal_data is duplicated across multiple pipelines (pipeline_qwen_image_edit.py, pipeline_qwen_image_edit_plus.py, pipeline_qwen_image_layered.py, pipeline_longcat_image_edit.py):

if (
    raw_image := None
    if isinstance(first_prompt, str)
    else first_prompt.get("multi_modal_data", {}).get("image")
) is None:
    image = None
elif isinstance(raw_image, list):
    image = [PIL.Image.open(im) if isinstance(im, str) else cast(PIL.Image.Image, im) for im in raw_image]
else:
    image = PIL.Image.open(raw_image) if isinstance(raw_image, str) else cast(PIL.Image.Image, raw_image)

Recommendation: Consider extracting this into a utility function to reduce duplication and ensure consistent behavior.

3. Architecture & Design

Strengths

  • Clean separation between parameters that belong in OmniDiffusionRequest vs extra_args
  • Good documentation of extra_args in docstrings (e.g., pipeline_flux2_klein.py:688-718)

Concerns

3.1 Mixed prompt embedding handling

pipeline_flux.py:598-620:

req_prompt_embeds = [p.get("prompt_embeds") if not isinstance(p, str) else None for p in req.prompts]
if any(p is not None for p in req_prompt_embeds):
    prompt_embeds = torch.stack(req_prompt_embeds)  # type: ignore # intentionally expect TypeError

The comment says "intentionally expect TypeError" but this provides a poor user experience when mixed input formats are provided. Consider adding a more descriptive error message:

if any(p is not None for p in req_prompt_embeds):
    if not all(p is not None for p in req_prompt_embeds):
        raise ValueError("Mixed prompt formats detected. Either all prompts must be embeddings or all must be strings.")
    prompt_embeds = torch.stack(req_prompt_embeds)

3.2 Inconsistent prompt handling between pipelines

Some pipelines handle batch prompts (e.g., pipeline_flux.py) while others only handle single prompts (e.g., pipeline_flux2_klein.py). This inconsistency could confuse developers. Consider documenting this difference or standardizing the approach.

4. Security & Safety

No significant security concerns identified. The changes are primarily refactoring existing logic without introducing new attack vectors.

Minor observation

The PIL.Image.open() calls on user-provided paths could potentially be a concern if untrusted input is allowed, but this appears to be existing behavior, not introduced by this PR.

5. Testing & Documentation

Documentation

  • ✅ Updated cfg_parallel.md and custom_pipeline.md
  • ✅ Updated example code in examples/offline_inference/custom_pipeline/
  • ✅ Added docstrings for extra_args parameters

Testing

  • ⚠️ Test results marked as "To be updated" in PR description
  • ⚠️ Diff is truncated at pipeline_z_image.py, cannot verify complete changes

Recommendation: Ensure test results are provided before merging, and verify the truncated file changes are complete.

6. Specific Suggestions

pipeline_flux.py:609

Consider using guidance_scale_provided for consistency:

if req.sampling_params.guidance_scale_provided:
    guidance_scale = req.sampling_params.guidance_scale
else:
    guidance_scale = 3.5

pipeline_flux.py:598-614

Add validation for mixed prompt formats:

req_prompt_embeds = [p.get("prompt_embeds") if not isinstance(p, str) else None for p in req.prompts]
if any(p is not None for p in req_prompt_embeds):
    if not all(p is not None for p in req_prompt_embeds):
        raise ValueError(
            "Mixed prompt formats detected. When using prompt embeddings, "
            "all prompts in the batch must provide embeddings."
        )
    prompt_embeds = torch.stack(req_prompt_embeds)

pipeline_longcat_image.py:476

Typo fix in comment:

# If at least one prompt is provided as an embedding,  # Changed from "at list"

pipeline_qwen_image_edit.py:631-642

The image extraction logic added here is good but duplicated. Consider creating a utility function:

def extract_image_from_prompt(prompt: dict | str) -> PIL.Image.Image | list[PIL.Image.Image] | None:
    """Extract image from prompt's multi_modal_data."""
    # Implementation here

pipeline_hunyuan_image_3.py:998-1001

Good pattern for guidance_scale handling:

if req.sampling_params.guidance_scale_provided:
    guidance_scale = req.sampling_params.guidance_scale
else:
    guidance_scale = 5.0

This should be the standard pattern across all pipelines.

7. Approval Status

LGTM with suggestions

The PR is well-structured and achieves its stated goal of simplifying pipeline signatures. The main concerns are:

  1. Inconsistent guidance_scale handling - Should be standardized (minor)
  2. Duplicated image extraction logic - Could be refactored (minor, can be addressed in follow-up)
  3. Test results pending - Should be verified before merge

The refactoring is clean and the documentation updates are comprehensive. I recommend merging after:

  1. Providing test results
  2. Addressing the guidance_scale consistency issue (or documenting why different approaches are used)
  3. Verifying the truncated pipeline_z_image.py changes are complete

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux.py Outdated
)
latents = req.sampling_params.latents

prompt_embeds_mask: torch.Tensor | None = req.sampling_params.extra_args.get("prompt_embeds_mask", None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion to reduce model migration cost from diffusers:

  • Update docstring to illustrate req, how to parse it to different arguments in diffusers pipeline
  • Some arugments like guidance_scale are in req.sampling_params, but some arguments likeprompt_embeds_mask is in req.sampling_params.extra_args. Model developers need to track the request handling code to figure it out. It's to explain why in the doc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new commit to document sampling_params and extra_args in adding_diffusion_model.md. Plz have a look.

Meanwhile, I think the docstring for req is hard to maintain. Because each model has its own docstring, and one can easily forgot to sync all models' docstring in future updates. Then, it is uncertain whether a model developer will see the most-updated version of the docstring.

Therefore, I also extended the description of req in adding_diffusion_model.md. A model developer is sure to read through this page. Do you think the current version is clear enough?

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Feb 24, 2026

After discussing with @SamitHuang , the commit above makes all diffusion models check guidance_scale_provided to decide whether to take the guidance scale from OmniDiffusionRequest or using the pipeline-specific default value. The default values are different across models.

Two special cases:

  • Stable diffusion 3: no default value was specified previously. I set it to 7.0 to stay consistent with diffusers implementation
  • z-image: it originally checked guidance_rescale value to decide whether to take guidance_scale from the request. This part seems to be a typo to me. I don't think it is how guidance_rescale is intended for.
    This also resolves item 2.1 from the bot

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Feb 24, 2026

Response to other comments from the bot:

  • 2.1. Resolved above.
  • 2.2: 0 for those variables are meaningless, and some of them are always not-None. So it is safe to use or here
  • 2.3. Agree and can refactor in future PR
  • 3.1 & 3.2. Agree but irrelevant to this PR. Can optimize in future PR
  • the typo: false positive. Already fixed

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, final question


2. **`sampling_params`**: a collection of common sampling parameters. Check the definition of [`OmniDiffusionSamplingParams`](https://docs.vllm.ai/projects/vllm-omni/en/latest/api/vllm_omni/inputs/data/#vllm_omni.inputs.data.OmniDiffusionSamplingParams) dataclass for their default values.
- If your model requires a less-common sampling parameter, you can read it from the `["extra_args"]` field of the dataclass. To ensure user experience, you may want to document the list of extra args that your pipeline honors.
- If you believe a sampling parameter is common enough to be included in the `OmniDiffusionSamplingParams` dataclass, feel free to open an issue or clarify it in your PR that adds your model.
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes thing complicated (e.g. no definition on what is a common parameter for diffusion).

BTW, is it really necessary to distinguish parameters from common and less-common (extra-args, like image_embeds)? why not parse them all via sampling_params?

Copy link
Copy Markdown
Contributor Author

@fhfuih fhfuih Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to our internal discussion, there indeed isn't a clear boundary between common and less common parameters. After my further investigation, I think maybe

  • Those parameters that have been inside extra_args stays there so that things don't break: cfg_text_scale, cfg_img_scale, audio_start_in_s. Note that many of them do make sense as sampling params
  • Those parameters that I newly move to extra_args in this PR, categorize them in four scenarios below:

  • Move to extra_args only if (1) they are related to the inference runtime or the encoding/embedding stage, but not the input data itself
    • cfg_normalization, cfg_truncation (only used in z-image)
    • enable_cfg_renorm, cfg_renorm_min, enable_prompt_rewrite (only in longcat)
    • num_waveforms_per_prompt (only in stable audio)
    • text_encoder_out_layers (only in flux klein)
    • Note: vLLM also shows example usage of extra_args for tokenizing and embedding routines
  • Those parameters that are used in many models and are related to runtime, promote them as an OmniDiffusionRequest property:
    • output_type (used in stable audio, several Alibaba models, stable audio, etc.) Make it default to None in OmniDiffusionRequest. Then in pipeline implementation, read it with model-specific fallback values (np or pil or others) that are consistent with their current implementation.
    • joint_attention_kwargs, callback_on_step_end, callback_on_step_end_tensor_inputs (in flux, longcat, z image, ovis), Make it default to empty values, and they are also always empty in the current implementations
  • Those parameters that are intended to be part of the input, it depends:
    • image_embeds, last_image (only used in wan i2v)---we can move them to req.prompt["multi_modal_data"]["image_embeds"] and req.prompt["multi_modal_data"]["last_image"] for clarity, but req.prompt is a list of single prompts, which results in a list of batch-1 embeds. For efficient data IO, we can also keep it in extra_args.
    • prompt_2, prompt_3, negative_prompt_2, negative_prompt_3 (used by both SD3 and flux)---promote them as OmniTextPrompt fields
    • pooled_prompt_embeds, negative_pooled_prompt_embeds (used by both SD3 and flux)---promote them as OmniTextPrompt fields, but same as the argument above about image_embeds.
    • prompt_embeds_mask,negative_prompt_embeds_mask (only in qwen image series)---same as above
    • Why keeping embeddings in extra_args also makes some sense:
      1. For image_embeds, some models always calculate it within the pipeline, but WAN allows user-input override. This is an unusual and not-unified pattern.
      2. For (neg)_prompt_embeds_mask and (neg_)pooled_prompt_embeds, embeddings are technically not expected in OmniTextPrompt. Plus these input data fields are not widely used by many models.

What do you think?

@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Feb 26, 2026
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

@fhfuih Please resolve conflicts

@fhfuih fhfuih force-pushed the refactor-pipeline-forward branch from 9d50e24 to 8957074 Compare March 18, 2026 02:57
…ne.forward (squash til Mar 18)

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
@fhfuih fhfuih force-pushed the refactor-pipeline-forward branch from 8957074 to 8201b73 Compare March 18, 2026 02:58
@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Mar 18, 2026

@fhfuih Please resolve conflicts

Finished, but this PR may still require some discussion. @SamitHuang could you have a look at my last discussion? @wtomin do you also have any suggestions since you have written relevant documentation? Thanks.

@Gaohan123 Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

do we still need this PR after #2383 ?

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Apr 24, 2026

do we still need this PR after #2383 ?

I think it's different. The redundant pipeline.forward(...) parameters are still there. They are primarily copied from original diffusers definitions but actually not (or seldom) used and not documented. I think I can rework on this refactor based on my last proposal at #1235 (comment), signal you when I finish

@Gaohan123
Copy link
Copy Markdown
Collaborator

@fhfuih Please resolve conflicts. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants