Stable Diffusion XL for Gaudi by dsocek · Pull Request #619 · huggingface/optimum-habana

dsocek · 2023-12-29T19:38:27Z

What does this PR do?

Implements Stable Diffusion XL (SDXL) pipeline for Gaudi
Implements Euler Discrete and Euler Ancestral Discrete schedulers for Gaudi
Adds examples (documentation and script) to for running SDXL base and SDXL turbo inference on Guadi
Adds SDXL related tests for CI
Adds SDXL to mdx

libinta

hi, @dsocek, can you start to check the comments? thanks

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Updated schedulers to support cases like image-to-image generation with different initial timesteps.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Also fixes bug when generated images not divisible by batch size.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek · 2024-01-04T02:20:34Z

hi, @dsocek, can you start to check the comments? thanks

@libinta we addressed your concerns, can you help check the update?

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

HuggingFaceDocBuilderDev · 2024-01-05T14:40:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dsocek · 2024-01-05T15:46:56Z

@regisss could you also please provide your insights on this PR?

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

regisss

Very clean PR! I left a few minor comments.

Have you benchmarked it a bit and got some throughput numbers?

regisss · 2024-01-05T17:04:58Z

+    model_cpu_offload_seq = "text_encoder->text_encoder_2->unet->vae"
+    _optional_components = ["tokenizer", "tokenizer_2", "text_encoder", "text_encoder_2"]
+    _callback_tensor_inputs = [
+        "latents_batch",
+        "text_embeddings_batch",
+        "add_text_embeddings_batch",
+        "add_time_ids_batch",
+    ]


We can remove these lines as the class already inherits them from StableDiffusionXLPipeline

Fixed. I also tried to be more correct in handling other callback input (maybe similar fix to SD pipeline? there is some pairing that takes place which is not addressed in original gaudi sd pipe).

Hmm not sure I got it, it doesn't recognize the tensors to callback properly?
And so you modified the default value in __call__ right?

I am not 100% confident how this should be handled. Initially what I tried is to define _callback_tensor_inputs directly for current derived class:

_callback_tensor_inputs = [ "latents_batch", "text_embeddings_batch", "add_text_embeddings_batch", "add_time_ids_batch", ]

So on callbacks these inputs can be popped out of stack when callback occurs.

Now, in the current PR version, instead of that approach I tried to pass callback input tensor from the base class and then adjust them properly by doing the pairing (via torch.cat) to align with batched structures created with _split_inputs_into_batches inside Gaudi class:

callback input tensor from the base class:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Line 310:

callback_on_step_end_tensor_inputs: List[str] = [ "latents", "prompt_embeds", "negative_prompt_embeds", "add_text_embeds", "add_time_ids", "negative_pooled_prompt_embeds", "negative_add_time_ids", ],

and then pairing then manually when popped from callback stack:
[optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py]
Lines 722-734

I am not sure what would be the best/correct implementation for this part..

I am also confused about the original stable diffusion pipeline for Gaudi. If you look at the how its implemented, there are also inherited callback input tenors:

https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L222
callback_on_step_end_tensor_inputs: List[str] = ["latents"],

However in in lines 474-476 it pops both latents and prompt_embeds. Here, prompt_embeds are popped into text_embeddings_batch which expects a catenated version (see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L193)
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L474

That looks kind of fishy, unless I am completely missing something obvious :)

dsocek · 2024-01-05T18:01:51Z

Very clean PR! I left a few minor comments.

Have you benchmarked it a bit and got some throughput numbers?

Yes indeed we benchmarked it on single Gaudi2 HPU. Here is a snapshot:

1 prompt(s) received, 20 generation(s) per prompt, 4 sample(s) per batch, 5 total batch(es).
Speed metrics: {'generation_runtime': 180.7683, 'generation_samples_per_second': 0.205, 'generation_steps_per_second': 0.086}

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Add negative_prompt_embeds and negative_pooled_prompt_embeds check for sdxl turbo.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

regisss

LGTM!

dsocek requested a review from regisss as a code owner December 29, 2023 19:38

libinta reviewed Jan 3, 2024

View reviewed changes

dsocek force-pushed the sdxl branch from cf1c916 to 4517fb0 Compare January 4, 2024 00:58

dsocek requested a review from mandy-li as a code owner January 4, 2024 00:58

dsocek requested a review from a user January 4, 2024 00:58

dsocek and others added 12 commits January 4, 2024 01:47

Add SDXL pipeline

723c538

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Add euler ancestral discrete scheduler for Gaudi.

c3fe022

Updated schedulers to support cases like image-to-image generation with different initial timesteps.

Scale model inputs using pre-generated parameters in schedulers.

a468c07

Fix timestep for HPU graphs in StableDiffusionXL

cae4356

Add Gaudi Euler Discrete scheduler and set as default for SDXL

a997749

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Add Euler schedulers to Gaudi utils

8aea91f

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Add stable diffusion XL tests

2bd6f99

Also fixes bug when generated images not divisible by batch size.

Add instructions for SDXL-Turbo to readme

dc14dee

Update scheduler in SDXL example

c1c7348

Revert change to stable diffusion pipeline

d8ac59f

Add SDXL and schedulers to mdx

4d55347

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Add inheritance in SDXL pipe and refactor SDXL examples

dd156a8

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek force-pushed the sdxl branch from f92ef74 to dd156a8 Compare January 4, 2024 02:15

dsocek requested a review from libinta January 4, 2024 02:17

Fix local model based pipeline selection in runner

ef4efdf

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek force-pushed the sdxl branch from 0428a8d to c5018f9 Compare January 5, 2024 17:06

Improve code quality via ruff

c5018f9

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

regisss added the run-test Run CI for PRs from external contributors label Jan 5, 2024

regisss reviewed Jan 5, 2024

View reviewed changes

dsocek and others added 2 commits January 5, 2024 19:55

Remove redundancies and reorganize SDXL pipe code

38143af

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Remove redundant code from sdxl tests

e258433

dsocek and others added 2 commits January 5, 2024 23:07

Fix callbacks in SDXL pipeline

afb3836

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

enable sdxl turbo

c99cf14

Add negative_prompt_embeds and negative_pooled_prompt_embeds check for sdxl turbo.

dsocek requested review from regisss and skavulya January 5, 2024 23:34

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 8, 2024

regisss reviewed Jan 8, 2024

View reviewed changes

Comment thread docs/source/package_reference/stable_diffusion_pipeline.mdx Outdated

Update mdx doc

a00140f

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

dsocek requested a review from regisss January 8, 2024 15:50

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 9, 2024

regisss approved these changes Jan 9, 2024

View reviewed changes

regisss merged commit db3f0a0 into huggingface:main Jan 9, 2024

regisss mentioned this pull request Jan 23, 2024

Add ControlNet Pipeline #585

Merged

3 tasks

jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024

Stable Diffusion XL for Gaudi (huggingface#619)

90eea82

Conversation

dsocek commented Dec 29, 2023

What does this PR do?

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsocek commented Jan 4, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Jan 5, 2024

Uh oh!

dsocek commented Jan 5, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

regisss Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsocek Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

regisss Jan 8, 2024

Choose a reason for hiding this comment

Uh oh!

dsocek Jan 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsocek commented Jan 5, 2024

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

regisss Jan 5, 2024 •

edited

Loading