Stable Diffusion XL for Gaudi#619
Conversation
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Updated schedulers to support cases like image-to-image generation with different initial timesteps.
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Also fixes bug when generated images not divisible by batch size.
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@regisss could you also please provide your insights on this PR? |
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
regisss
left a comment
There was a problem hiding this comment.
Very clean PR! I left a few minor comments.
Have you benchmarked it a bit and got some throughput numbers?
| model_cpu_offload_seq = "text_encoder->text_encoder_2->unet->vae" | ||
| _optional_components = ["tokenizer", "tokenizer_2", "text_encoder", "text_encoder_2"] | ||
| _callback_tensor_inputs = [ | ||
| "latents_batch", | ||
| "text_embeddings_batch", | ||
| "add_text_embeddings_batch", | ||
| "add_time_ids_batch", | ||
| ] |
There was a problem hiding this comment.
We can remove these lines as the class already inherits them from StableDiffusionXLPipeline
There was a problem hiding this comment.
Fixed. I also tried to be more correct in handling other callback input (maybe similar fix to SD pipeline? there is some pairing that takes place which is not addressed in original gaudi sd pipe).
There was a problem hiding this comment.
Hmm not sure I got it, it doesn't recognize the tensors to callback properly?
And so you modified the default value in __call__ right?
There was a problem hiding this comment.
I am not 100% confident how this should be handled. Initially what I tried is to define _callback_tensor_inputs directly for current derived class:
_callback_tensor_inputs = [
"latents_batch",
"text_embeddings_batch",
"add_text_embeddings_batch",
"add_time_ids_batch",
]
So on callbacks these inputs can be popped out of stack when callback occurs.
Now, in the current PR version, instead of that approach I tried to pass callback input tensor from the base class and then adjust them properly by doing the pairing (via torch.cat) to align with batched structures created with _split_inputs_into_batches inside Gaudi class:
callback input tensor from the base class:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Line 310:
callback_on_step_end_tensor_inputs: List[str] = [
"latents",
"prompt_embeds",
"negative_prompt_embeds",
"add_text_embeds",
"add_time_ids",
"negative_pooled_prompt_embeds",
"negative_add_time_ids",
],
and then pairing then manually when popped from callback stack:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Lines 722-734
I am not sure what would be the best/correct implementation for this part..
I am also confused about the original stable diffusion pipeline for Gaudi. If you look at the how its implemented, there are also inherited callback input tenors:
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L222
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
However in in lines 474-476 it pops both latents and prompt_embeds. Here, prompt_embeds are popped into text_embeddings_batch which expects a catenated version (see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L193)
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L474
That looks kind of fishy, unless I am completely missing something obvious :)
Yes indeed we benchmarked it on single Gaudi2 HPU. Here is a snapshot:
|
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Add negative_prompt_embeds and negative_pooled_prompt_embeds check for sdxl turbo.
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
What does this PR do?