Support SD3 ControlNet and Multi-ControlNet. #8566

wangqixun · 2024-06-15T03:25:14Z

What does this PR do?

Support SD3 ControlNet.
Support SD3 Multi-ControlNet
A pipeline that supports SD3 Multi-ControlNet has been implemented.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

wangqixun · 2024-06-15T03:28:47Z

@haofanwang 交给浩哥了❤️❤️ 我去准备权重和demo了

wangqixun · 2024-06-15T04:45:13Z

demo and weight are here

https://huggingface.co/InstantX/SD3-Controlnet-Canny_alpha_512

import torch
from diffusers import StableDiffusion3Pipeline
from diffusers.models.controlnet_sd3 import ControlNetSD3Model
from diffusers.utils.torch_utils import randn_tensor
import sys, os
sys.path.append('/path/diffusers/examples/community')
from pipeline_stable_diffusion_3_controlnet import StableDiffusion3CommonPipeline

# load pipeline
base_model = 'stabilityai/stable-diffusion-3-medium-diffusers'
pipe = StableDiffusion3CommonPipeline.from_pretrained(
    base_model, 
    controlnet_list=['InstantX/SD3-Controlnet-Canny_alpha_512']
)
pipe.to('cuda:0', torch.float16)

prompt = 'Anime style illustration of a girl wearing a suit. In the background we see a big rain approaching.'
n_prompt = 'NSFW, nude, naked, porn, ugly'

# controlnet config
controlnet_conditioning = [
    dict(
        control_index=0,
        control_image=load_image('https://huggingface.co/InstantX/SD3-Controlnet-Canny_alpha_512/resolve/main/canny.jpg'),
        control_weight=0.5,
        control_pooled_projections='zeros'
    )
]

# infer
image = pipe(
    prompt=prompt,
    negative_prompt=n_prompt,
    controlnet_conditioning=controlnet_conditioning,
    num_inference_steps=28,
    guidance_scale=7.0,
    height=512,
    width=512,
    latents=latents,
).images[0]

haofanwang · 2024-06-15T04:55:07Z

Our teammate has implemented ControlNet for SD3 and trained a canny model for testing. Could you review this PR? @sayakpaul @yiyixuxu

wangqixun · 2024-06-15T11:03:31Z

beta 1024-pixel canny model

sayakpaul · 2024-06-15T11:10:05Z

I would be in favor of supporting this through the core codebase actually.

So, would like to first seek opinions from @yiyixuxu first before reviewing.

In any case, I truly appreciate your hard work here! Solid!

HuggingFaceDocBuilderDev · 2024-06-15T11:16:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks a ton for the PR! 🔥🔥🔥🔥🔥🔥🔥🔥
agree with @sayakpaul here: we want to integrate this into core directly

src/diffusers/models/controlnet_sd3.py

src/diffusers/models/transformers/transformer_sd3.py

src/diffusers/models/controlnet_sd3.py

examples/community/pipeline_stable_diffusion_3_controlnet.py

s9anus98a · 2024-06-16T00:25:29Z

Please add support for controlnet image2image pipeline something like StableDiffusionControlNetImg2ImgPipeline example from sd1.5:


pipe = StableDiffusionImg2ImgPipeline.from_pipe(
    pipe,
    custom_pipeline="jyoung105/sd15_perturbed_attention_guidance_i2i",
    torch_dtype=torch.float16
).to("cuda")

pipe = StableDiffusionControlNetImg2ImgPipeline.from_pipe(pipe,
                                                                    controlnet=controlnet,
                                                                    torch_dtype=torch.float16).to('cuda')
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

sayakpaul · 2024-06-16T18:29:15Z

@yiyixuxu let me know if you'd like me to review as well.

haofanwang · 2024-06-16T18:45:14Z

@sayakpaul We will update soon based on comments above. Then you can review again.

yiyixuxu

looks good to me!!
Left a few comments, I think we can merge very soon!

will also need tests and doc

src/diffusers/models/controlnet_sd3.py

src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py

yiyixuxu · 2024-06-16T21:36:27Z

src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py

+            pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0)
+
+        # 3. Prepare control image
+        if isinstance(self.controlnet, SD3ControlNetModel):


so I liked that in the initial implementation, you made sure the controlnet is a MultiControlNetModel regardless how many controllers are passed or in which format. you did this in __init__:controlnet_list = MultiControlNetSD3Model(controlnet_list)

it is different from our code but an improvement I think, because now we only need to deal with MultiControlNetModel. I think we can keep it as it is and refactor all our controlnet together this way in a follow-up PR:)

I suggest to keep consistency with other ControlNet pipelines for now, and we can refactor them together later.

appleyang123 · 2024-06-17T03:42:02Z

Great. It is a nice and hard work~ Can you also share the training code?

src/diffusers/models/controlnet_sd3.py

DN6 · 2024-06-17T11:59:08Z

src/diffusers/models/controlnet_sd3.py

+            in_channels=in_channels,
+            embed_dim=self.inner_dim,
+            pos_embed_type=None,
+            # pos_embed_max_size=pos_embed_max_size,  # hard-code for now.


Is this not needed?

Also have a question regarding using to PatchEmbed layers. Would love to understand this more and why one has pos_embed_max_size has defined and another one has it None.

they did not add a positional embedding here, so it does not need pos_embed_max_size, but yeah, +1 on @sayakpaul 's questions - is there any reason we skip this for control input?

they did not add a positional embedding here, so it does not need pos_embed_max_size, but yeah, +1 on @sayakpaul 's questions - is there any reason we skip this for control input?

yes, In the forward method of ControlNet, the position embedding only needs to be added once. Therefore, only one of the two PatchEmbed includes positional information.

src/diffusers/models/transformers/transformer_sd3.py

DN6

Looks good 👍🏽 Could we please add a fast tests for the ControlNet model and ControlNetPipeline

sayakpaul · 2024-06-17T19:54:57Z

src/diffusers/dependency_versions_table.py

@@ -40,7 +40,7 @@
    "tensorboard": "tensorboard",
    "torch": "torch>=1.4",
    "torchvision": "torchvision",
-    "transformers": "transformers>=4.25.1",
+    "transformers": "transformers>=4.41.2",


Is this still needed?

This is reformatted automatically after make style

No I mean this is not just a formatting change, it’s changing the version. We have had this change merged recently from another PR actually.

@haofanwang I think you need to merge main (these change are already in main)

https://github.com/huggingface/diffusers/blob/main/src/diffusers/dependency_versions_table.py

src/diffusers/models/controlnet_sd3.py

sayakpaul · 2024-06-17T19:59:07Z

src/diffusers/models/controlnet_sd3.py

+            controlnet.pos_embed.load_state_dict(transformer.pos_embed.state_dict(), strict=False)
+            controlnet.pos_embed_input.load_state_dict(transformer.pos_embed.state_dict(), strict=False)
+            controlnet.time_text_embed.load_state_dict(transformer.time_text_embed.state_dict(), strict=False)
+            controlnet.context_embedder.load_state_dict(transformer.context_embedder.state_dict(), strict=False)
+            controlnet.transformer_blocks.load_state_dict(transformer.transformer_blocks.state_dict(), strict=False)
+
+            controlnet.pos_embed_input = zero_module(controlnet.pos_embed_input)


Could we perhaps just do controlnet.load_state_dict(transformer.state_dict(), strict=False) or is it too risky?

we wouldn't prefer that

actually only need strict=False for transformer_blocks, no? @haofanwang, the other layers should be identical and be able to load without strict=False. this way:

when we read the code, we know which layers are identical, which layers are not

get a better error message when the checkpoints are wrong

I don't feel too strongly about this, so ok if you just want to keep it as it is

Yeah that is better indeed. Thanks for explaining.

we wouldn't prefer that

actually only need strict=False for transformer_blocks, no? @haofanwang, the other layers should be identical and be able to load without strict=False. this way:

when we read the code, we know which layers are identical, which layers are not

get a better error message when the checkpoints are wrong

I don't feel too strongly about this, so ok if you just want to keep it as it is

Among all the involved members, only the initialization of transformer_blocks need requires strict=False , and pos_embed_input should be initialized with all zeros. controlnet.pos_embed_input.load_state_dict(transformer.pos_embed.state_dict(), strict=False) can be deleted.

sayakpaul

This looks very good to me!

TODOs:

Docs (please ping @stevhliu once done)
Tests

haofanwang · 2024-06-18T17:05:33Z

@DN6 @sayakpaul @yiyixuxu @stevhliu

Added test and doc. make quality and make style passed locally.

stevhliu

Looks good so far! Remember to add controlnet_sd3.md to the toctree in the models and pipelines sections so their docs get built 🙂

stevhliu · 2024-06-18T17:51:04Z

docs/source/en/api/models/controlnet_sd3.md

+from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
+
+controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny")
+pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers",controlnet=controlnet)


Suggested change

pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers",controlnet=controlnet)

pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)

stevhliu · 2024-06-18T17:52:17Z

docs/source/en/api/models/controlnet_sd3.md

+
+*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
+
+SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.


I think you can move this to the beginning so users reading it immediately know what this is. For example:

SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3. The ControlNet model was introduced in ...

stevhliu · 2024-06-18T17:53:38Z

docs/source/en/api/pipelines/controlnet_sd3.md

+
+# ControlNet with Stable Diffusion 3
+
+ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.


Maybe also add something similar here:

StableDiffusion3ControlNetPipeline is an implementation of ControlNet for Stable Diffusion 3. ControlNet was introduced ...

stevhliu · 2024-06-18T17:55:43Z

src/diffusers/models/controlnet_sd3.py

+
+class SD3MultiControlNetModel(ModelMixin):
+    r"""
+    Multiple `SD3ControlNetModel` wrapper class for Multi-SD3ControlNet


Suggested change

Multiple `SD3ControlNetModel` wrapper class for Multi-SD3ControlNet

`SD3ControlNetModel` wrapper class for Multi-SD3ControlNet.

yiyixuxu · 2024-06-18T17:59:37Z

docs/source/en/api/models/controlnet_sd3.md

+specific language governing permissions and limitations under the License.
+-->
+
+# SD3ControlNetModel


also need to add the new doc pages to https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml

yiyixuxu · 2024-06-18T21:29:34Z

I modified the fast tests here #8627
all relevant tests passed there - feel free to cherry-pick the last commit! 16e27b9

haofanwang · 2024-06-19T00:09:37Z

Done. Should be ready to merge:D

yiyixuxu · 2024-06-19T00:19:56Z

@haofanwang have to run fix-copies again since we updated the encode_prompt on sd3 pipeline 😬 sorry! hope this is the last time!

yiyixuxu · 2024-06-19T00:59:31Z

thank you!

* sd3 controlnet --------- Co-authored-by: haofanwang <[email protected]>

wangqixun added 2 commits June 15, 2024 11:16

sd3 controlnet

2692d7f

Copyright InstantX

8a13f7e

haofanwang mentioned this pull request Jun 15, 2024

how to add controlnet in sd3! #8527

Closed

Fannovel16 mentioned this pull request Jun 15, 2024

[Feature] Support for SD3 ControlNet comfyanonymous/ComfyUI#3734

Closed

yiyixuxu reviewed Jun 15, 2024

View reviewed changes

src/diffusers/models/controlnet_sd3.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jun 15, 2024

View reviewed changes

refactor

dbe967d

yiyixuxu approved these changes Jun 16, 2024

View reviewed changes

yiyixuxu requested review from DN6 and sayakpaul June 16, 2024 21:39

make style

c686041

fix a typo

bfa6f24

DN6 reviewed Jun 17, 2024

View reviewed changes

src/diffusers/models/controlnet_sd3.py Outdated Show resolved Hide resolved

DN6 reviewed Jun 17, 2024

View reviewed changes

src/diffusers/models/transformers/transformer_sd3.py Outdated Show resolved Hide resolved

DN6 reviewed Jun 17, 2024

View reviewed changes

src/diffusers/models/transformers/transformer_sd3.py Show resolved Hide resolved

DN6 reviewed Jun 17, 2024

View reviewed changes

sayakpaul reviewed Jun 17, 2024

View reviewed changes

src/diffusers/models/controlnet_sd3.py Outdated Show resolved Hide resolved

sayakpaul reviewed Jun 17, 2024

View reviewed changes

src/diffusers/models/controlnet_sd3.py Show resolved Hide resolved

sayakpaul reviewed Jun 17, 2024

View reviewed changes

haofanwang added 5 commits June 18, 2024 16:17

update

500945e

add test&doc

410bb2c

version

3ab786a

doc

445dba2

Merge branch 'huggingface:main' into sd3_control

f55f3f8

stevhliu reviewed Jun 18, 2024

View reviewed changes

yiyixuxu reviewed Jun 18, 2024

View reviewed changes

haofanwang added 3 commits June 19, 2024 02:08

check

7b98442

fix

806f952

check dummies

f1d1da6

yiyixuxu mentioned this pull request Jun 18, 2024

[do not merge] sd3 control - make fast tests smaller #8627

Closed

haofanwang added 3 commits June 19, 2024 07:47

fast test

0903d82

Merge branch 'huggingface:main' into sd3_control

f5a1e4d

format

c2aa954

fix-copies

be3633d

yiyixuxu merged commit e5564d4 into huggingface:main Jun 19, 2024
14 of 15 checks passed

geroldmeisinger mentioned this pull request Jun 20, 2024

Support for Stable Diffusion 3 Fannovel16/comfyui_controlnet_aux#380

Closed

yiyixuxu pushed a commit that referenced this pull request Jun 20, 2024

Support SD3 ControlNet and Multi-ControlNet. (#8566)

2eafde7

* sd3 controlnet --------- Co-authored-by: haofanwang <[email protected]>

This was referenced Nov 20, 2024

addressing issue #5, SD3 training sign-language-processing/signwriting-illustration#11

Draft

controlnet_sd3 parameters updated #9977

Closed

add expected parameters to controlnet_sd3 #9974

Closed

	pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers",controlnet=controlnet)
	pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)


		We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.

		SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.


		# ControlNet with Stable Diffusion 3

		ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.

	Multiple `SD3ControlNetModel` wrapper class for Multi-SD3ControlNet
	`SD3ControlNetModel` wrapper class for Multi-SD3ControlNet.

Support SD3 ControlNet and Multi-ControlNet. #8566

Support SD3 ControlNet and Multi-ControlNet. #8566

Conversation

wangqixun commented Jun 15, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

wangqixun commented Jun 15, 2024

wangqixun commented Jun 15, 2024

haofanwang commented Jun 15, 2024

wangqixun commented Jun 15, 2024

sayakpaul commented Jun 15, 2024

HuggingFaceDocBuilderDev commented Jun 15, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

s9anus98a commented Jun 16, 2024

sayakpaul commented Jun 16, 2024

haofanwang commented Jun 16, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

appleyang123 commented Jun 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DN6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

haofanwang commented Jun 18, 2024 • edited Loading

stevhliu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiyixuxu commented Jun 18, 2024

haofanwang commented Jun 19, 2024

yiyixuxu commented Jun 19, 2024

yiyixuxu commented Jun 19, 2024

wangqixun commented Jun 15, 2024 •

edited

Loading

haofanwang commented Jun 18, 2024 •

edited

Loading