Skip to content

SDXL fine tuning#667

Merged
regisss merged 1 commit into
huggingface:mainfrom
dsocek:sd-sdxl-fine-tuning
Feb 27, 2024
Merged

SDXL fine tuning#667
regisss merged 1 commit into
huggingface:mainfrom
dsocek:sd-sdxl-fine-tuning

Conversation

@dsocek
Copy link
Copy Markdown
Contributor

@dsocek dsocek commented Jan 26, 2024

What does this PR do?

This PR adds fine-tuning for SDXL on Gaudi

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsocek I will provide you a patch with train_text_to_image_sdxl.py in gs system

Comment thread examples/stable-diffusion/train_text_to_image.py Outdated
Comment thread examples/stable-diffusion/train_text_to_image_sdxl.py
Comment thread examples/stable-diffusion/train_text_to_image_sdxl.py Outdated
Comment thread examples/stable-diffusion/train_text_to_image_sdxl.py
Comment thread examples/stable-diffusion/README.md Outdated
Comment thread examples/stable-diffusion/README.md Outdated
libinta added a commit that referenced this pull request Jan 29, 2024
also fixed dataset issue with moving bf16 related later
and fix the buffer overflow issue with disable autocast.
libinta added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 30, 2024
also did 3 changes
1. removed autocase to avoid buffer overflow
2. change mixed precision model train /dataset process order to avoid dataset issue
3. val issue with input/weight dtype not matched
@dsocek dsocek force-pushed the sd-sdxl-fine-tuning branch 3 times, most recently from 8b6df81 to 841b776 Compare January 30, 2024 17:41
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Jan 30, 2024

Additional updates:

  1. Included additional fixes from @libinta (casting in original Gaudi SDXL pipeline)
  2. rebased branch with latest OH master
  3. added new working SDXL LoRA script for Gaudi
  4. updated README with Training example for LoRA and Inference with obtained LoRA weights

@dsocek dsocek requested a review from libinta January 30, 2024 22:18
Comment thread examples/stable-diffusion/README.md Outdated
Comment thread examples/stable-diffusion/README.md Outdated
Comment thread examples/stable-diffusion/training/train_text_to_image_lora.py
Comment thread examples/stable-diffusion/README.md Outdated
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Feb 2, 2024

Updates:

  • Added CI tests
  • Refactored readme (split README into inference and training to make it cleaner)
  • Merged SDXL fine-tuning and lora into ONE script
  • Updated README with changed command lines

@dsocek dsocek requested a review from libinta February 2, 2024 20:01
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libinta The new training folder for training examples looks good to me 👍

Comment on lines +19 to +20
https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_sdxl.py
https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_lora_sdxl.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this script should be renamed to train_text_to_image_sdxl.py as I see that Diffusers also has an example called train_text_to_image.py.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss should we keep 1 script and use it to train both stable diffusion and stable diffusion xl with parameter?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, if that doesn't make the script too complicated

@imangohari1
Copy link
Copy Markdown
Contributor

@libinta any updates on this?

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Feb 9, 2024 via email

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Feb 12, 2024

@dsocek can you update to the latest?thx

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the patch to latest?

@dsocek dsocek force-pushed the sd-sdxl-fine-tuning branch from 6541186 to 56b06a5 Compare February 12, 2024 23:34
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Feb 12, 2024

@libinta I have just rebased to latest OH code

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Feb 13, 2024 via email

@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Feb 13, 2024

Daniel, Can you check one more time to see if it's easier to keep stable diffusion xl and stable diffusion as separate training script? Or combine them? Thanks, Libin From: Daniel Socek @.> Sent: Monday, February 12, 2024 3:36 PM To: huggingface/optimum-habana @.> Cc: Libin Tang @.>; Mention @.> Subject: Re: [huggingface/optimum-habana] Sd sdxl fine tuning (PR #667) @libintahttps://github.com/libinta I have just rebased to latest OH code - Reply to this email directly, view it on GitHub<#667 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGLGHZNYULHRFS4Y57XGOTDYTKROTAVCNFSM6AAAAABCMTBITCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZG44DKNBYGI. You are receiving this because you were mentioned.Message ID: @.***>

@libinta I think better would be to have separate train scripts for SDXL and SD. This would be easier to implement and also easier to maintain IMO. It also is more similar to how diffusers are arranged. So we would then have 2 scripts but each could cover multiple fine tunings approaches.

pipe = GaudiStableDiffusionXLPipeline.from_pretrained(
model_id,
scheduler=GaudiEulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler"),
torch_dtype=torch.bfloat16,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change GaudiEulerDiscreteScheduler to DDPMScheduler like original script as we observed noise image with EulerDiscreteScheduler ?

Copy link
Copy Markdown
Contributor Author

@dsocek dsocek Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, do we need to implement GaudiDDPMScheduler?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes. Maybe it works well with GaudiDDIMScheduler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss @libinta we tested now with updated diffusers Euler Discrete scheduler works good (no noise images any more observed)

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsocek As we discussed, can you rename the training script to train_text_to_image_sdxl.py ?

@atakaha
Copy link
Copy Markdown
Contributor

atakaha commented Feb 13, 2024

@libinta and @dsocek
I faced diffusers version dependency issue between driver 1.13 and 1.14.
driver 1.13 use diffusers 0.23.1 and driver 1.14 use diffusers 0.26.3. This 0.23.1 has "text_encoder_lora_state_dict" but 0.26.3 not. Do we support only 1.13 or 1.14 or both?

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Feb 14, 2024 via email

@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Feb 14, 2024

@dsocek As we discussed, can you rename the training script to train_text_to_image_sdxl.py ?

@libinta renamed

@dsocek dsocek force-pushed the sd-sdxl-fine-tuning branch from b6226b6 to 204210d Compare February 15, 2024 07:34
@dsocek
Copy link
Copy Markdown
Contributor Author

dsocek commented Feb 15, 2024

@libinta

  • We validated on 1.14 and latest diffusers
  • We updated to support PEFT as needed
  • We also updated tests
  • I squashed all commits into single commit with 4 co-authors for easier merge
  • I moved this PR from draft to actual PR
  • Scheduler can be updated in future commit (if needed)

@dsocek dsocek marked this pull request as ready for review February 15, 2024 07:45
@dsocek dsocek requested review from libinta and regisss February 15, 2024 07:45
@dsocek dsocek changed the title Sd sdxl fine tuning SDXL fine tuning Feb 15, 2024
@imangohari1
Copy link
Copy Markdown
Contributor

@libinta @regisss
Hi team,
Any update on this? We would appreciate your review. Thanks.

@libinta libinta added the run-test Run CI for PRs from external contributors label Feb 16, 2024
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread examples/stable-diffusion/training/requirements.txt
Comment thread examples/stable-diffusion/training/README.md Outdated
pipe = GaudiStableDiffusionXLPipeline.from_pretrained(
model_id,
scheduler=GaudiEulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler"),
torch_dtype=torch.bfloat16,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes. Maybe it works well with GaudiDDIMScheduler?

Comment thread examples/stable-diffusion/training/train_text_to_image_sdxl.py Outdated
@dsocek dsocek force-pushed the sd-sdxl-fine-tuning branch from 204210d to d376a16 Compare February 22, 2024 21:57
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
@vidyasiv vidyasiv mentioned this pull request Feb 23, 2024
3 tasks
@dsocek dsocek requested a review from regisss February 26, 2024 16:00
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Feb 27, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit f3919e1 into huggingface:main Feb 27, 2024
yeonsily added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Feb 27, 2024
libinta pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Feb 28, 2024
schoi-habana pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 1, 2024
puneeshkhanna pushed a commit to puneeshkhanna/optimum-habana-fork that referenced this pull request Mar 11, 2024
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
…face#667)

Signed-off-by: Urszula <urszula.golowicz@intel.com>
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants