Fix textual inversion SDXL and add support for 2nd text encoder by dsocek · Pull Request #9010 · huggingface/diffusers

dsocek · 2024-07-29T23:29:06Z

Fix Textual Inversion SDXL fine-tuning and add support for training 2nd text encoder

Textual inversion for SDXL fine tuning script is not working as no guidance from new token is resembled in generated images.

Training Set `./cat` (6 images):

Results Before Fix:

Training:

accelerate launch textual_inversion_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --train_data_dir="./cat" \
  --learnable_property="object" \
  --placeholder_token="<cat-toy>" \
  --initializer_token="toy" \
  --mixed_precision="bf16" \
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=500 \
  --learning_rate=5.0e-04 \
  --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --save_as_full_pipeline \
  --output_dir="./textual_inversion_cat_sdxl"

Inference:

from diffusers import StableDiffusionXLPipeline
import torch

model_id = "./textual_inversion_cat_sdxl"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack.png")

Output (from 4 inferences):

These results show no or very poor guidance from the object token.

Results After Fix:

Same Training command

Same Inference script Output (4 inferences):

Good guidance is shown after fix (and this is after only 500 training/fine-tuning steps)

Also, now we also infer with the 2nd text encoder:

from diffusers import StableDiffusionXLPipeline
import torch

model_id = "./textual_inversion_cat_sdxl"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt="", prompt_2=prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack-prompt_2.png")

Output with 2nd text encoder from 4 inferences:

This PR also updates documentation with inference examples (see README_sdxl.md)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek · 2024-08-05T14:31:47Z

@patrickvonplaten could kindly help assign appropriate reviewers if bandwidth is available?

dsocek · 2024-08-08T18:24:32Z

cc: @sayakpaul @yiyixuxu

sayakpaul · 2024-08-09T02:00:19Z

Thanks very much for the fix. Will merge as soon as the CI is green.

HuggingFaceDocBuilderDev · 2024-08-09T02:05:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek · 2024-08-09T14:08:25Z

@sayakpaul Thanks, I seem to forgot to run make style/quality, just added a style fix commit

sayakpaul · 2024-08-09T14:53:14Z

Thanks for your awesome contributions!

dsocek · 2024-08-09T14:55:07Z

@sayakpaul Thank you very much for taking time to review!

* Fix textual inversion SDXL and add support for 2nd text encoder Signed-off-by: Daniel Socek <daniel.socek@intel.com> * Fix style/quality of text inv for sdxl Signed-off-by: Daniel Socek <daniel.socek@intel.com> --------- Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

dsocek mentioned this pull request Jul 29, 2024

Add textual inversion XL for Gaudi huggingface/optimum-habana#868

Merged

Fix textual inversion SDXL and add support for 2nd text encoder

72bcdf0

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek force-pushed the textual_inv_sdxl_fix branch from 35eea48 to 72bcdf0 Compare July 30, 2024 13:29

Merge branch 'main' into textual_inv_sdxl_fix

d680501

Fix style/quality of text inv for sdxl

31a660d

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Merge branch 'main' into textual_inv_sdxl_fix

74f2af0

sayakpaul merged commit c1079f0 into huggingface:main Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix textual inversion SDXL and add support for 2nd text encoder#9010

Fix textual inversion SDXL and add support for 2nd text encoder#9010
sayakpaul merged 4 commits intohuggingface:mainfrom
dsocek:textual_inv_sdxl_fix

dsocek commented Jul 29, 2024

Uh oh!

dsocek commented Aug 5, 2024

Uh oh!

dsocek commented Aug 8, 2024

Uh oh!

sayakpaul commented Aug 9, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 9, 2024

Uh oh!

dsocek commented Aug 9, 2024

Uh oh!

sayakpaul commented Aug 9, 2024

Uh oh!

dsocek commented Aug 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dsocek commented Jul 29, 2024

Fix Textual Inversion SDXL fine-tuning and add support for training 2nd text encoder

Training Set ./cat (6 images):

Results Before Fix:

Results After Fix:

Uh oh!

dsocek commented Aug 5, 2024

Uh oh!

dsocek commented Aug 8, 2024

Uh oh!

sayakpaul commented Aug 9, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 9, 2024

Uh oh!

dsocek commented Aug 9, 2024

Uh oh!

sayakpaul commented Aug 9, 2024

Uh oh!

dsocek commented Aug 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Training Set `./cat` (6 images):