Skip to content

Stable Diffusion XL for Gaudi#619

Merged
regisss merged 19 commits into
huggingface:mainfrom
dsocek:sdxl
Jan 9, 2024
Merged

Stable Diffusion XL for Gaudi#619
regisss merged 19 commits into
huggingface:mainfrom
dsocek:sdxl

Conversation

@dsocek
Copy link
Copy Markdown
Contributor

@dsocek dsocek commented Dec 29, 2023

What does this PR do?

  • Implements Stable Diffusion XL (SDXL) pipeline for Gaudi
  • Implements Euler Discrete and Euler Ancestral Discrete schedulers for Gaudi
  • Adds examples (documentation and script) to for running SDXL base and SDXL turbo inference on Guadi
  • Adds SDXL related tests for CI
  • Adds SDXL to mdx

@dsocek dsocek requested a review from regisss as a code owner December 29, 2023 19:38
Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, @dsocek, can you start to check the comments? thanks

@dsocek dsocek requested a review from mandy-li as a code owner January 4, 2024 00:58
@dsocek dsocek requested a review from a user January 4, 2024 00:58
dsocek and others added 12 commits January 4, 2024 01:47
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Updated schedulers to support cases like image-to-image
generation with different initial timesteps.
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Also fixes bug when generated images not divisible by batch size.
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Jan 4, 2024

hi, @dsocek, can you start to check the comments? thanks

@libinta we addressed your concerns, can you help check the update?

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Jan 5, 2024

@regisss could you also please provide your insights on this PR?

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@regisss regisss added the run-test Run CI for PRs from external contributors label Jan 5, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean PR! I left a few minor comments.

Have you benchmarked it a bit and got some throughput numbers?

Comment thread optimum/habana/diffusers/schedulers/scheduling_ddim.py Outdated
Comment on lines +120 to +126
model_cpu_offload_seq = "text_encoder->text_encoder_2->unet->vae"
_optional_components = ["tokenizer", "tokenizer_2", "text_encoder", "text_encoder_2"]
_callback_tensor_inputs = [
"latents_batch",
"text_embeddings_batch",
"add_text_embeddings_batch",
"add_time_ids_batch",
]
Copy link
Copy Markdown
Collaborator

@regisss regisss Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these lines as the class already inherits them from StableDiffusionXLPipeline

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I also tried to be more correct in handling other callback input (maybe similar fix to SD pipeline? there is some pairing that takes place which is not addressed in original gaudi sd pipe).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I got it, it doesn't recognize the tensors to callback properly?
And so you modified the default value in __call__ right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% confident how this should be handled. Initially what I tried is to define _callback_tensor_inputs directly for current derived class:

_callback_tensor_inputs = [
        "latents_batch",
        "text_embeddings_batch",
        "add_text_embeddings_batch",
        "add_time_ids_batch",
    ]

So on callbacks these inputs can be popped out of stack when callback occurs.

Now, in the current PR version, instead of that approach I tried to pass callback input tensor from the base class and then adjust them properly by doing the pairing (via torch.cat) to align with batched structures created with _split_inputs_into_batches inside Gaudi class:

callback input tensor from the base class:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Line 310:

callback_on_step_end_tensor_inputs: List[str] = [
            "latents",
            "prompt_embeds",
            "negative_prompt_embeds",
            "add_text_embeds",
            "add_time_ids",
            "negative_pooled_prompt_embeds",
            "negative_add_time_ids",
        ],

and then pairing then manually when popped from callback stack:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Lines 722-734

I am not sure what would be the best/correct implementation for this part..

I am also confused about the original stable diffusion pipeline for Gaudi. If you look at the how its implemented, there are also inherited callback input tenors:

https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L222
callback_on_step_end_tensor_inputs: List[str] = ["latents"],

However in in lines 474-476 it pops both latents and prompt_embeds. Here, prompt_embeds are popped into text_embeddings_batch which expects a catenated version (see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L193)
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L474

That looks kind of fishy, unless I am completely missing something obvious :)

Comment thread tests/test_diffusers.py Outdated
Comment thread tests/test_diffusers.py Outdated
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Jan 5, 2024

Very clean PR! I left a few minor comments.

Have you benchmarked it a bit and got some throughput numbers?

Yes indeed we benchmarked it on single Gaudi2 HPU. Here is a snapshot:

1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
Speed metrics: {'generation_runtime': 180.7683, 'generation_samples_per_second': 0.205, 'generation_steps_per_second': 0.086}

dsocek and others added 2 commits January 5, 2024 19:55
dsocek and others added 2 commits January 5, 2024 23:07
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Add negative_prompt_embeds and negative_pooled_prompt_embeds check for
sdxl turbo.
@dsocek dsocek requested review from regisss and skavulya January 5, 2024 23:34
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 8, 2024
Comment thread docs/source/package_reference/stable_diffusion_pipeline.mdx Outdated
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@dsocek dsocek requested a review from regisss January 8, 2024 15:50
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 9, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit db3f0a0 into huggingface:main Jan 9, 2024
@regisss regisss mentioned this pull request Jan 23, 2024
3 tasks
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants