SDXL fine tuning by dsocek · Pull Request #667 · huggingface/optimum-habana

dsocek · 2024-01-26T19:08:03Z

What does this PR do?

This PR adds fine-tuning for SDXL on Gaudi

libinta

@dsocek I will provide you a patch with train_text_to_image_sdxl.py in gs system

also fixed dataset issue with moving bf16 related later and fix the buffer overflow issue with disable autocast.

also did 3 changes 1. removed autocase to avoid buffer overflow 2. change mixed precision model train /dataset process order to avoid dataset issue 3. val issue with input/weight dtype not matched

dsocek · 2024-01-30T22:18:42Z

Additional updates:

Included additional fixes from @libinta (casting in original Gaudi SDXL pipeline)
rebased branch with latest OH master
added new working SDXL LoRA script for Gaudi
updated README with Training example for LoRA and Inference with obtained LoRA weights

dsocek · 2024-02-02T20:01:36Z

Updates:

Added CI tests
Refactored readme (split README into inference and training to make it cleaner)
Merged SDXL fine-tuning and lora into ONE script
Updated README with changed command lines

regisss

@libinta The new training folder for training examples looks good to me 👍

regisss · 2024-02-05T07:23:30Z

+https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_sdxl.py
+https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_lora_sdxl.py


Maybe this script should be renamed to train_text_to_image_sdxl.py as I see that Diffusers also has an example called train_text_to_image.py.

@regisss should we keep 1 script and use it to train both stable diffusion and stable diffusion xl with parameter?

Ideally yes, if that doesn't make the script too complicated

imangohari1 · 2024-02-09T19:12:48Z

@libinta any updates on this?

libinta · 2024-02-09T20:54:42Z

Will update next week. Thanks Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: SM Iman Gohari ***@***.***> Sent: Friday, February 9, 2024 11:12:59 AM To: huggingface/optimum-habana ***@***.***> Cc: Libin Tang ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/optimum-habana] Sd sdxl fine tuning (PR #667) @libinta<https://github.com/libinta> any updates on this? — Reply to this email directly, view it on GitHub<#667 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGLGHZIKWE2RT2USGNFEY7DYSZYLXAVCNFSM6AAAAABCMTBITCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGQ3DMNJVGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

libinta · 2024-02-12T22:35:04Z

@dsocek can you update to the latest?thx

libinta

can you update the patch to latest?

dsocek · 2024-02-12T23:35:56Z

@libinta I have just rebased to latest OH code

libinta · 2024-02-13T05:55:55Z

Daniel, Can you check one more time to see if it's easier to keep stable diffusion xl and stable diffusion as separate training script? Or combine them? Thanks, Libin From: Daniel Socek ***@***.***> Sent: Monday, February 12, 2024 3:36 PM To: huggingface/optimum-habana ***@***.***> Cc: Libin Tang ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/optimum-habana] Sd sdxl fine tuning (PR #667) @libinta<https://github.com/libinta> I have just rebased to latest OH code - Reply to this email directly, view it on GitHub<#667 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGLGHZNYULHRFS4Y57XGOTDYTKROTAVCNFSM6AAAAABCMTBITCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZG44DKNBYGI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

dsocek · 2024-02-13T13:56:27Z

Daniel, Can you check one more time to see if it's easier to keep stable diffusion xl and stable diffusion as separate training script? Or combine them? Thanks, Libin From: Daniel Socek @.> Sent: Monday, February 12, 2024 3:36 PM To: huggingface/optimum-habana @.> Cc: Libin Tang @.>; Mention @.> Subject: Re: [huggingface/optimum-habana] Sd sdxl fine tuning (PR #667) @libinta https://github.com/libinta I have just rebased to latest OH code - Reply to this email directly, view it on GitHub<#667 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGLGHZNYULHRFS4Y57XGOTDYTKROTAVCNFSM6AAAAABCMTBITCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZG44DKNBYGI. You are receiving this because you were mentioned.Message ID: @.***>

@libinta I think better would be to have separate train scripts for SDXL and SD. This would be easier to implement and also easier to maintain IMO. It also is more similar to how diffusers are arranged. So we would then have 2 scripts but each could cover multiple fine tunings approaches.

libinta · 2024-02-13T19:37:32Z

+pipe = GaudiStableDiffusionXLPipeline.from_pretrained(
+  model_id,
+  scheduler=GaudiEulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler"),
+  torch_dtype=torch.bfloat16,


can you change GaudiEulerDiscreteScheduler to DDPMScheduler like original script as we observed noise image with EulerDiscreteScheduler ?

For this change, do we need to implement GaudiDDPMScheduler?

I guess yes. Maybe it works well with GaudiDDIMScheduler?

@regisss @libinta we tested now with updated diffusers Euler Discrete scheduler works good (no noise images any more observed)

libinta

@dsocek As we discussed, can you rename the training script to train_text_to_image_sdxl.py ?

atakaha · 2024-02-13T22:14:06Z

@libinta and @dsocek
I faced diffusers version dependency issue between driver 1.13 and 1.14.
driver 1.13 use diffusers 0.23.1 and driver 1.14 use diffusers 0.26.3. This 0.23.1 has "text_encoder_lora_state_dict" but 0.26.3 not. Do we support only 1.13 or 1.14 or both?

libinta · 2024-02-14T01:13:19Z

Yes, please update to the latest one.thanks. From: Akihiro Takahashi ***@***.***> Sent: Tuesday, February 13, 2024 2:14 PM To: huggingface/optimum-habana ***@***.***> Cc: Libin Tang ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/optimum-habana] Sd sdxl fine tuning (PR #667) @libinta<https://github.com/libinta> and @dsocek<https://github.com/dsocek> There is diffusers version dependency issue between driver 1.13 and 1.14. driver 1.13 use diffusers 0.23.1 and driver 1.14 use diffusers 0.26.3. This 0.23.1 has "text_encoder_lora_state_dict" but 0.26.3 not. Do we support only 1.13 or 1.14 or both? I recommend go for 1.14. - Reply to this email directly, view it on GitHub<#667 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGLGHZOBK5FN4CXGWZRKS6LYTPQTVAVCNFSM6AAAAABCMTBITCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSG4ZDQOBYHE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

dsocek · 2024-02-14T22:06:29Z

@dsocek As we discussed, can you rename the training script to train_text_to_image_sdxl.py ?

@libinta renamed

dsocek · 2024-02-15T07:41:57Z

@libinta

We validated on 1.14 and latest diffusers
We updated to support PEFT as needed
We also updated tests
I squashed all commits into single commit with 4 co-authors for easier merge
I moved this PR from draft to actual PR
Scheduler can be updated in future commit (if needed)

imangohari1 · 2024-02-16T17:28:16Z

@libinta @regisss
Hi team,
Any update on this? We would appreciate your review. Thanks.

HuggingFaceDocBuilderDev · 2024-02-19T02:44:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss · 2024-02-19T06:44:04Z

+pipe = GaudiStableDiffusionXLPipeline.from_pretrained(
+  model_id,
+  scheduler=GaudiEulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler"),
+  torch_dtype=torch.bfloat16,


I guess yes. Maybe it works well with GaudiDDIMScheduler?

Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: Libin Tang <litang@habana.ai>

regisss

LGTM!

(huggingface#667)

Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: Libin Tang <litang@habana.ai>

…face#667) Signed-off-by: Urszula <urszula.golowicz@intel.com> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

libinta reviewed Jan 29, 2024

View reviewed changes

libinta added a commit that referenced this pull request Jan 29, 2024

Add hpu migration from #667

c42b1fc

also fixed dataset issue with moving bf16 related later and fix the buffer overflow issue with disable autocast.

dsocek force-pushed the sd-sdxl-fine-tuning branch 3 times, most recently from 8b6df81 to 841b776 Compare January 30, 2024 17:41

dsocek requested a review from libinta January 30, 2024 22:18

libinta reviewed Jan 31, 2024

View reviewed changes

Comment thread examples/stable-diffusion/README.md Outdated

Comment thread examples/stable-diffusion/README.md Outdated

Comment thread examples/stable-diffusion/training/train_text_to_image_lora.py

libinta reviewed Jan 31, 2024

View reviewed changes

Comment thread examples/stable-diffusion/README.md Outdated

dsocek requested a review from libinta February 2, 2024 20:01

regisss reviewed Feb 5, 2024

View reviewed changes

libinta reviewed Feb 12, 2024

View reviewed changes

dsocek force-pushed the sd-sdxl-fine-tuning branch from 6541186 to 56b06a5 Compare February 12, 2024 23:34

libinta reviewed Feb 13, 2024

View reviewed changes

libinta approved these changes Feb 15, 2024

View reviewed changes

dsocek force-pushed the sd-sdxl-fine-tuning branch from b6226b6 to 204210d Compare February 15, 2024 07:34

dsocek marked this pull request as ready for review February 15, 2024 07:45

dsocek requested review from libinta and regisss February 15, 2024 07:45

dsocek changed the title ~~Sd sdxl fine tuning~~ SDXL fine tuning Feb 15, 2024

libinta added the run-test Run CI for PRs from external contributors label Feb 16, 2024

regisss reviewed Feb 19, 2024

View reviewed changes

dsocek force-pushed the sd-sdxl-fine-tuning branch from 204210d to d376a16 Compare February 22, 2024 21:57

Add Stable Diffusion XL fine-tuning for Gaudi

d376a16

Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: Libin Tang <litang@habana.ai>

vidyasiv mentioned this pull request Feb 23, 2024

Controlnet training #650

Merged

3 tasks

libinta approved these changes Feb 26, 2024

View reviewed changes

dsocek requested a review from regisss February 26, 2024 16:00

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Feb 27, 2024

regisss approved these changes Feb 27, 2024

View reviewed changes

regisss merged commit f3919e1 into huggingface:main Feb 27, 2024

yeonsily added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Feb 27, 2024

Cherry-pick SDXL fine tuning

e4963c8

(huggingface#667)

yeonsily mentioned this pull request Feb 27, 2024

SDXL fine tuning HabanaAI/optimum-habana-fork#81

Merged

3 tasks

libinta pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Feb 28, 2024

Cherry-pick SDXL fine tuning (#81)

67794c6

(huggingface#667)

schoi-habana pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 1, 2024

Cherry-pick SDXL fine tuning (#81)

1f947b7

(huggingface#667)

		https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_sdxl.py
		https://github.com/huggingface/diffusers/blob/v0.23.1/examples/text_to_image/train_text_to_image_lora_sdxl.py

Conversation

dsocek commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsocek commented Jan 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsocek commented Feb 2, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

libinta Feb 12, 2024

Choose a reason for hiding this comment

Uh oh!

regisss Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

imangohari1 commented Feb 9, 2024

Uh oh!

libinta commented Feb 9, 2024 via email

Uh oh!

libinta commented Feb 12, 2024

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

dsocek commented Feb 12, 2024

Uh oh!

libinta commented Feb 13, 2024 via email

Uh oh!

dsocek commented Feb 13, 2024

Uh oh!

libinta Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

dsocek Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

regisss Feb 19, 2024

Choose a reason for hiding this comment

Uh oh!

dsocek Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

atakaha commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

libinta commented Feb 14, 2024 via email

Uh oh!

dsocek commented Feb 14, 2024

Uh oh!

dsocek commented Feb 15, 2024

Uh oh!

imangohari1 commented Feb 16, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 19, 2024

Uh oh!

Uh oh!

Uh oh!

dsocek commented Jan 26, 2024 •

edited

Loading

dsocek Feb 14, 2024 •

edited

Loading

atakaha commented Feb 13, 2024 •

edited

Loading