Skip to content

Fix textual inversion SDXL and add support for 2nd text encoder#9010

Merged
sayakpaul merged 4 commits intohuggingface:mainfrom
dsocek:textual_inv_sdxl_fix
Aug 9, 2024
Merged

Fix textual inversion SDXL and add support for 2nd text encoder#9010
sayakpaul merged 4 commits intohuggingface:mainfrom
dsocek:textual_inv_sdxl_fix

Conversation

@dsocek
Copy link
Copy Markdown
Contributor

@dsocek dsocek commented Jul 29, 2024

Fix Textual Inversion SDXL fine-tuning and add support for training 2nd text encoder

Textual inversion for SDXL fine tuning script is not working as no guidance from new token is resembled in generated images.

Training Set ./cat (6 images):

image

Results Before Fix:

Training:

accelerate launch textual_inversion_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --train_data_dir="./cat" \
  --learnable_property="object" \
  --placeholder_token="<cat-toy>" \
  --initializer_token="toy" \
  --mixed_precision="bf16" \
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=500 \
  --learning_rate=5.0e-04 \
  --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --save_as_full_pipeline \
  --output_dir="./textual_inversion_cat_sdxl"

Inference:

from diffusers import StableDiffusionXLPipeline
import torch

model_id = "./textual_inversion_cat_sdxl"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack.png")

Output (from 4 inferences):
image

These results show no or very poor guidance from the object token.

Results After Fix:

Same Training command

Same Inference script Output (4 inferences):
image

Good guidance is shown after fix (and this is after only 500 training/fine-tuning steps)

Also, now we also infer with the 2nd text encoder:

from diffusers import StableDiffusionXLPipeline
import torch

model_id = "./textual_inversion_cat_sdxl"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

image = pipe(prompt="", prompt_2=prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack-prompt_2.png")

Output with 2nd text encoder from 4 inferences:
image

This PR also updates documentation with inference examples (see README_sdxl.md)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@dsocek dsocek force-pushed the textual_inv_sdxl_fix branch from 35eea48 to 72bcdf0 Compare July 30, 2024 13:29
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Aug 5, 2024

@patrickvonplaten could kindly help assign appropriate reviewers if bandwidth is available?

@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Aug 8, 2024

cc: @sayakpaul @yiyixuxu

@sayakpaul
Copy link
Copy Markdown
Member

Thanks very much for the fix. Will merge as soon as the CI is green.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Aug 9, 2024

@sayakpaul Thanks, I seem to forgot to run make style/quality, just added a style fix commit

@sayakpaul sayakpaul merged commit c1079f0 into huggingface:main Aug 9, 2024
@sayakpaul
Copy link
Copy Markdown
Member

Thanks for your awesome contributions!

@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Aug 9, 2024

@sayakpaul Thank you very much for taking time to review!

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* Fix textual inversion SDXL and add support for 2nd text encoder

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Fix style/quality of text inv for sdxl

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

---------

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants