-
Notifications
You must be signed in to change notification settings - Fork 1k
[Chore]: refactor out unused/redundant params in diffusion pipelines #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @knlnguyen1802 I have updated this file and
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fhfuih It is fine for me if the example can run successfully |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes thing complicated (e.g. no definition on what is a common parameter for diffusion).
BTW, is it really necessary to distinguish parameters from common and less-common (extra-args, like
image_embeds)? why not parse them all viasampling_params?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our internal discussion, there indeed isn't a clear boundary between common and less common parameters. After my further investigation, I think maybe
extra_argsstays there so that things don't break:cfg_text_scale,cfg_img_scale,audio_start_in_s. Note that many of them do make sense as sampling paramsextra_argsin this PR, categorize them in four scenarios below:cfg_normalization,cfg_truncation(only used in z-image)enable_cfg_renorm,cfg_renorm_min,enable_prompt_rewrite(only in longcat)num_waveforms_per_prompt(only in stable audio)text_encoder_out_layers(only in flux klein)extra_argsfor tokenizing and embedding routinesoutput_type(used in stable audio, several Alibaba models, stable audio, etc.) Make it default to None in OmniDiffusionRequest. Then in pipeline implementation, read it with model-specific fallback values (np or pil or others) that are consistent with their current implementation.joint_attention_kwargs,callback_on_step_end,callback_on_step_end_tensor_inputs(in flux, longcat, z image, ovis), Make it default to empty values, and they are also always empty in the current implementationsimage_embeds,last_image(only used in wan i2v)---we can move them toreq.prompt["multi_modal_data"]["image_embeds"]andreq.prompt["multi_modal_data"]["last_image"]for clarity, butreq.promptis a list of single prompts, which results in a list of batch-1 embeds. For efficient data IO, we can also keep it in extra_args.prompt_2,prompt_3,negative_prompt_2,negative_prompt_3(used by both SD3 and flux)---promote them asOmniTextPromptfieldspooled_prompt_embeds,negative_pooled_prompt_embeds(used by both SD3 and flux)---promote them asOmniTextPromptfields, but same as the argument above aboutimage_embeds.prompt_embeds_mask,negative_prompt_embeds_mask(only in qwen image series)---same as aboveimage_embeds, some models always calculate it within the pipeline, but WAN allows user-input override. This is an unusual and not-unified pattern.(neg)_prompt_embeds_maskand(neg_)pooled_prompt_embeds, embeddings are technically not expected in OmniTextPrompt. Plus these input data fields are not widely used by many models.What do you think?