[Chore]: refactor out unused/redundant params in diffusion pipelines#1235
[Chore]: refactor out unused/redundant params in diffusion pipelines#1235fhfuih wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
vllm-omni/vllm_omni/diffusion/models/flux/pipeline_flux.py
Lines 632 to 633 in 80edb73
When req.prompts contains normal strings and no embeddings (the typical case), prompt_embeds and negative_prompt_embeds are only assigned inside the if any(...) blocks above, so they remain unbound and the subsequent check_inputs/encode_prompt usage raises UnboundLocalError before any generation. Previously these were defaulted to None via the function signature, so this is a regression. Initialize both variables to None before the conditional (the same pattern appears in longcat_image, ovis_image, qwen_image, sd3, stable_audio, and z_image pipelines).
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR refactors the forward methods in diffusion pipeline implementations to remove unused and redundant parameters. As discussed in PR #797 and #1196, these parameters were never used in practice since the forward function is always called with only an OmniDiffusionRequest object. This cleanup makes the API clearer and teaches developers the correct paradigm for adding new models.
Changes:
- Removed unused function parameters from forward methods (prompt, height, width, num_inference_steps, guidance_scale, generator, latents, prompt_embeds, negative_prompt_embeds, etc.)
- Consolidated parameter extraction to use only
req.sampling_paramsandreq.prompts - Added explicit extraction logic for prompt_embeds and negative_prompt_embeds from request prompts
- Standardized default value fallback patterns across pipelines
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/diffusion/models/z_image/pipeline_z_image.py | Removed 11 unused parameters; consolidated to extract all values from req object |
| vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_ti2v.py | Removed 9 unused parameters; added explicit prompt_embeds extraction |
| vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2_i2v.py | Removed 9 unused parameters; added explicit prompt_embeds extraction |
| vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py | Removed 9 unused parameters; added explicit prompt_embeds extraction |
| vllm_omni/diffusion/models/stable_audio/pipeline_stable_audio.py | Removed 10 unused parameters; added prompt_embeds extraction logic |
| vllm_omni/diffusion/models/sd3/pipeline_sd3.py | Removed 10 unused parameters; consolidated parameter extraction from req |
| vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_layered.py | Removed 13 unused parameters; added image extraction from multi_modal_data |
| vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit_plus.py | Removed 13 unused parameters; added image extraction from multi_modal_data |
| vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py | Removed 13 unused parameters; added image extraction from multi_modal_data |
| vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py | Removed 12 unused parameters; consolidated all extraction from req |
| vllm_omni/diffusion/models/ovis_image/pipeline_ovis_image.py | Removed 12 unused parameters; added prompt_embeds extraction logic |
| vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image_edit.py | Removed 9 unused parameters; added image extraction from multi_modal_data |
| vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py | Removed 11 unused parameters; reorganized parameter extraction |
| vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py | Removed 11 unused parameters; simplified prompt and image extraction |
| vllm_omni/diffusion/models/flux/pipeline_flux.py | Removed 11 unused parameters; added detailed prompt_embeds extraction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| image = [PIL.Image.open(im) if isinstance(im, str) else cast(PIL.Image.Image, im) for im in raw_image] | ||
| else: | ||
| image = PIL.Image.open(raw_image) if isinstance(raw_image, str) else cast(PIL.Image.Image, raw_image) | ||
|
|
There was a problem hiding this comment.
If raw_image is None (line 637), then image will be set to None (line 638). However, on line 644, the code attempts to access image[0].size or image.size, which will raise an AttributeError if image is None. This code path should either handle the None case or ensure that image is always set to a valid value before reaching line 644.
| if image is None: | |
| raise ValueError( | |
| "No image was provided in 'multi_modal_data' for fallback preprocessing; " | |
| "an image is required to compute target dimensions." | |
| ) |
There was a problem hiding this comment.
Yeah, many pipelines have strange type annotation that mismatches later type checks. The are confusing and originally conflicting. I am not to fix everything in this PR.
| ) | ||
| sigmas = req.sampling_params.sigmas | ||
| max_sequence_length = req.sampling_params.max_sequence_length or 512 | ||
| guidance_scale = req.sampling_params.guidance_scale if req.sampling_params.guidance_rescale is not None else 5.0 |
There was a problem hiding this comment.
The condition should check guidance_scale_provided instead of guidance_rescale. This is inconsistent with all other pipelines which use guidance_scale_provided to determine if the user explicitly provided a guidance scale. The current condition checks guidance_rescale (a different parameter), which will likely always evaluate to not None since it has a default value of 0.0, causing the guidance_scale logic to behave incorrectly.
| guidance_scale = req.sampling_params.guidance_scale if req.sampling_params.guidance_rescale is not None else 5.0 | |
| guidance_scale = ( | |
| req.sampling_params.guidance_scale | |
| if req.sampling_params.guidance_scale_provided | |
| else 5.0 | |
| ) |
There was a problem hiding this comment.
This logic is already there. I don't want to break things.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@wtomin @SamitHuang PTAL |
|
do we also need to change the relevant api in docs? |
| self, | ||
| req: OmniDiffusionRequest, | ||
| prompt: str | list[str] | None = None, | ||
| prompt_2: str | list[str] | None = None, |
There was a problem hiding this comment.
it's confusing to keep a part of them. For clarity, we should either parse all arguments for pipeline forward via req, or keep all of them for consistency with diffusers
There was a problem hiding this comment.
Updated. Now all the extra params are read from req.sampling_params.extra_args (same as vllm behavior and already used by SD audio). Default values are preserved. See if you feel this new version is better 😸
For breaking changes: There are no user-level breaking changes.
For feature addition: Indeed, these parameters are previously never used by upstream callers, so only their default values are used. Now, moving them to
If you think user doc is desired, I can add it as well. @SamitHuang maybe also asking your thoughts. |
de49863 to
2b1b00d
Compare
There was a problem hiding this comment.
@knlnguyen1802 I have updated this file and custom_pipeline.md to match the usage of forward function after #797 and the cleanup refactoring in this PR. See if that looks good to you
There was a problem hiding this comment.
@fhfuih It is fine for me if the example can run successfully
|
@hsliuustc0106 @SamitHuang Could you help mark this PR as ready? I rebased onto main branch and updated relevant docs and new Huanyuan model. |
| attention_kwargs: dict | None = None, | ||
| **kwargs, | ||
| ) -> DiffusionOutput: | ||
| def forward(self, req: OmniDiffusionRequest) -> DiffusionOutput: |
There was a problem hiding this comment.
Can you also check the generated video are unchanged after refactor? It's possible that the offline / online inference examples may use one of these remove hyper-parameters
There was a problem hiding this comment.
Confirmed that the same video is generated using wan t2v, and their intermediate latents in all timestamps are all the same. Used offline text_to_video example script with custom sampling parameters like this
python examples/offline_inference/text_to_video/text_to_video.py \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--negative-prompt "orange cat" \
--height 480 \
--width 480 \
--num-frames 41 \
--guidance-scale 7.0 \
--guidance-scale-high 3.0 \
--flow-shift 12.0 \
--num-inference-steps 40 \
--fps 12 \
--output t2v_out.mp4 --model /home/models/Wan-AI/Wan2.2-T2V-A14B-Diffusers --seed 42t2v_out_3.mp4
t2v_out_1.mp4
|
@vllm-omni-reviewer |
🤖 VLLM-Omni PR ReviewCode Review: Refactor Unused/Redundant Params in Diffusion Pipelines1. OverviewThis PR refactors the Overall Assessment: Positive - This is a well-structured cleanup PR that improves code clarity and reduces confusion for developers adding new models. 2. Code QualityStrengths
Concerns2.1 Inconsistent
|
| ) | ||
| latents = req.sampling_params.latents | ||
|
|
||
| prompt_embeds_mask: torch.Tensor | None = req.sampling_params.extra_args.get("prompt_embeds_mask", None) |
There was a problem hiding this comment.
A suggestion to reduce model migration cost from diffusers:
- Update docstring to illustrate
req, how to parse it to different arguments indiffuserspipeline - Some arugments like
guidance_scaleare inreq.sampling_params, but some arguments likeprompt_embeds_maskis inreq.sampling_params.extra_args. Model developers need to track the request handling code to figure it out. It's to explain why in the doc
There was a problem hiding this comment.
I have added a new commit to document sampling_params and extra_args in adding_diffusion_model.md. Plz have a look.
Meanwhile, I think the docstring for req is hard to maintain. Because each model has its own docstring, and one can easily forgot to sync all models' docstring in future updates. Then, it is uncertain whether a model developer will see the most-updated version of the docstring.
Therefore, I also extended the description of req in adding_diffusion_model.md. A model developer is sure to read through this page. Do you think the current version is clear enough?
|
After discussing with @SamitHuang , the commit above makes all diffusion models check Two special cases:
|
|
Response to other comments from the bot:
|
|
@vllm-omni-reviewer |
SamitHuang
left a comment
There was a problem hiding this comment.
LGTM, final question
|
|
||
| 2. **`sampling_params`**: a collection of common sampling parameters. Check the definition of [`OmniDiffusionSamplingParams`](https://docs.vllm.ai/projects/vllm-omni/en/latest/api/vllm_omni/inputs/data/#vllm_omni.inputs.data.OmniDiffusionSamplingParams) dataclass for their default values. | ||
| - If your model requires a less-common sampling parameter, you can read it from the `["extra_args"]` field of the dataclass. To ensure user experience, you may want to document the list of extra args that your pipeline honors. | ||
| - If you believe a sampling parameter is common enough to be included in the `OmniDiffusionSamplingParams` dataclass, feel free to open an issue or clarify it in your PR that adds your model. |
There was a problem hiding this comment.
It makes thing complicated (e.g. no definition on what is a common parameter for diffusion).
BTW, is it really necessary to distinguish parameters from common and less-common (extra-args, like image_embeds)? why not parse them all via sampling_params?
There was a problem hiding this comment.
According to our internal discussion, there indeed isn't a clear boundary between common and less common parameters. After my further investigation, I think maybe
- Those parameters that have been inside
extra_argsstays there so that things don't break:cfg_text_scale,cfg_img_scale,audio_start_in_s. Note that many of them do make sense as sampling params - Those parameters that I newly move to
extra_argsin this PR, categorize them in four scenarios below:
- Move to extra_args only if (1) they are related to the inference runtime or the encoding/embedding stage, but not the input data itself
cfg_normalization,cfg_truncation(only used in z-image)enable_cfg_renorm,cfg_renorm_min,enable_prompt_rewrite(only in longcat)num_waveforms_per_prompt(only in stable audio)text_encoder_out_layers(only in flux klein)- Note: vLLM also shows example usage of
extra_argsfor tokenizing and embedding routines
- Those parameters that are used in many models and are related to runtime, promote them as an OmniDiffusionRequest property:
output_type(used in stable audio, several Alibaba models, stable audio, etc.) Make it default to None in OmniDiffusionRequest. Then in pipeline implementation, read it with model-specific fallback values (np or pil or others) that are consistent with their current implementation.joint_attention_kwargs,callback_on_step_end,callback_on_step_end_tensor_inputs(in flux, longcat, z image, ovis), Make it default to empty values, and they are also always empty in the current implementations
- Those parameters that are intended to be part of the input, it depends:
image_embeds,last_image(only used in wan i2v)---we can move them toreq.prompt["multi_modal_data"]["image_embeds"]andreq.prompt["multi_modal_data"]["last_image"]for clarity, butreq.promptis a list of single prompts, which results in a list of batch-1 embeds. For efficient data IO, we can also keep it in extra_args.prompt_2,prompt_3,negative_prompt_2,negative_prompt_3(used by both SD3 and flux)---promote them asOmniTextPromptfieldspooled_prompt_embeds,negative_pooled_prompt_embeds(used by both SD3 and flux)---promote them asOmniTextPromptfields, but same as the argument above aboutimage_embeds.prompt_embeds_mask,negative_prompt_embeds_mask(only in qwen image series)---same as above- Why keeping embeddings in extra_args also makes some sense:
- For
image_embeds, some models always calculate it within the pipeline, but WAN allows user-input override. This is an unusual and not-unified pattern. - For
(neg)_prompt_embeds_maskand(neg_)pooled_prompt_embeds, embeddings are technically not expected in OmniTextPrompt. Plus these input data fields are not widely used by many models.
- For
What do you think?
|
@fhfuih Please resolve conflicts |
9d50e24 to
8957074
Compare
…ne.forward (squash til Mar 18) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
8957074 to
8201b73
Compare
Finished, but this PR may still require some discussion. @SamitHuang could you have a look at my last discussion? @wtomin do you also have any suggestions since you have written relevant documentation? Thanks. |
|
do we still need this PR after #2383 ? |
I think it's different. The redundant pipeline.forward(...) parameters are still there. They are primarily copied from original diffusers definitions but actually not (or seldom) used and not documented. I think I can rework on this refactor based on my last proposal at #1235 (comment), signal you when I finish |
|
@fhfuih Please resolve conflicts. Thanks |
Purpose
As is discussed before in #797
many diffusion pipelines have several extra parameters defined in
forwardfunction. They have never been used---the forward function has always been called with only oneOmniDiffusionRequestobject (even before PR 797).This PR does this refactor. In particular, it also aims to address this discussion: #1196
and ships in companion with #1196---so that the "how to add a new model" documentation teaches developers to follow the correct paradigm.
Test Plan
No new features are added, no logic is changed. Will just run existing tests
Test Result
To be updated
Additional notes
In this refactor
forward, onlyreqis passed). Only their default values are usedorinstead ofif .. is not None, the "default" values are applied when the user-passed values are None or 0. Please see my argument below why this is acceptable.Why it is acceptable to apply alternative default values when the user passes 0:
forwardfunction signature. It means the pipeline authors are intended to adopt these parameters (but could not in the current codebase). So it makes sense to use these values when the user-passed value is invalid on purpose (i.e., explicit 0)Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)