-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[Sd3 Dreambooth LoRA] Add text encoder training for the clip encoders #8630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
When no individual prompts provided, only instance prompts, got this: |
|
thanks @r-aristov! I think it should be working now |
| if text_encoder_lora_layers: | ||
| state_dict.update(pack_weights(text_encoder_lora_layers, "text_encoder")) | ||
|
|
||
| if text_encoder_2_lora_layers: | ||
| state_dict.update(pack_weights(text_encoder_2_lora_layers, "text_encoder_2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm this via experiments that text encoder 3 training doesn't matter too much? Can be done separately and won't block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we skipped not because we think it doesn't matter but because we thought dealing with training the T5 would be a different animal than the already well known CLIP text encoder training (also on VRAM consumption side). So indeed we left it to a future PR to investigate the T5 training!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly worried about the dynamics about this so, let’s make sure we run ample experiments to see if training two text encoders while keeping the other one fixed works as expected.
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have left some comments. My main question is how much does training the text encoder matter here given we use three in SD3? Could we see some concrete comparative examples?
Additionally, we need to add tests to https://github.com/huggingface/diffusers/blob/main/tests/lora/test_lora_layers_sd3.py and add a note about --train_text_encoder in the REAMDE.
|
Cool, the results are stunning. So, the TODOs are:
|
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
…#8630) * add clip text-encoder training * no dora * text encoder traing fixes * text encoder traing fixes * text encoder training fixes * text encoder training fixes * text encoder training fixes * text encoder training fixes * add text_encoder layers to save_lora * style * fix imports * style * fix text encoder * review changes * review changes * review changes * minor change * add lora tag * style * add readme notes * add tests for clip encoders * style * typo * fixes * style * Update tests/lora/test_lora_layers_sd3.py Co-authored-by: Sayak Paul <[email protected]> * Update examples/dreambooth/README_sd3.md Co-authored-by: Sayak Paul <[email protected]> * minor readme change --------- Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Sayak Paul <[email protected]>

add text encoder training support for the CLIP encoders to the dreambooth lora training script for SD3