[IP Adapters] introduce `ip_adapter_image_embeds` in the SD pipeline call #6868

sayakpaul · 2024-02-06T08:11:09Z

What does this PR do?

As per the discussion of #6830.

Testing script:

from diffusers import AutoPipelineForText2Image
from diffusers.models import ImageProjection
import torch
from diffusers.utils import load_image


def encode_image(image_encoder, feature_extractor, image, device, num_images_per_prompt, output_hidden_states=None):
    dtype = next(image_encoder.parameters()).dtype

    if not isinstance(image, torch.Tensor):
        image = feature_extractor(image, return_tensors="pt").pixel_values

    image = image.to(device=device, dtype=dtype)
    if output_hidden_states:
        image_enc_hidden_states = image_encoder(image, output_hidden_states=True).hidden_states[-2]
        image_enc_hidden_states = image_enc_hidden_states.repeat_interleave(num_images_per_prompt, dim=0)
        uncond_image_enc_hidden_states = image_encoder(
            torch.zeros_like(image), output_hidden_states=True
        ).hidden_states[-2]
        uncond_image_enc_hidden_states = uncond_image_enc_hidden_states.repeat_interleave(num_images_per_prompt, dim=0)
        return image_enc_hidden_states, uncond_image_enc_hidden_states
    else:
        image_embeds = image_encoder(image).image_embeds
        image_embeds = image_embeds.repeat_interleave(num_images_per_prompt, dim=0)
        uncond_image_embeds = torch.zeros_like(image_embeds)

        return image_embeds, uncond_image_embeds


@torch.no_grad()
def prepare_ip_adapter_image_embeds(
    unet,
    image_encoder,
    feature_extractor,
    ip_adapter_image,
    do_classifier_free_guidance,
    device,
    num_images_per_prompt,
):
    if not isinstance(ip_adapter_image, list):
        ip_adapter_image = [ip_adapter_image]

    if len(ip_adapter_image) != len(unet.encoder_hid_proj.image_projection_layers):
        raise ValueError(
            f"`ip_adapter_image` must have same length as the number of IP Adapters. Got {len(ip_adapter_image)} images and {len(unet.encoder_hid_proj.image_projection_layers)} IP Adapters."
        )

    image_embeds = []
    for single_ip_adapter_image, image_proj_layer in zip(
        ip_adapter_image, unet.encoder_hid_proj.image_projection_layers
    ):
        output_hidden_state = not isinstance(image_proj_layer, ImageProjection)
        single_image_embeds, single_negative_image_embeds = encode_image(
            image_encoder, feature_extractor, single_ip_adapter_image, device, 1, output_hidden_state
        )
        single_image_embeds = torch.stack([single_image_embeds] * num_images_per_prompt, dim=0)
        single_negative_image_embeds = torch.stack([single_negative_image_embeds] * num_images_per_prompt, dim=0)

        if do_classifier_free_guidance:
            single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
            single_image_embeds = single_image_embeds.to(device)

        image_embeds.append(single_image_embeds)

    return image_embeds


pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
    "cuda"
)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipeline.set_ip_adapter_scale(0.6)

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png"
)
image_embeds = prepare_ip_adapter_image_embeds(
    unet=pipeline.unet,
    image_encoder=pipeline.image_encoder,
    feature_extractor=pipeline.feature_extractor,
    ip_adapter_image=image,
    do_classifier_free_guidance=True,
    device="cuda",
    num_images_per_prompt=1,
)
pipeline.unload_ip_adapter()


generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt="best quality, high quality, wearing sunglasses",
    ip_adapter_image_embeds=image_embeds,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("embeds_out.png")

Here's our cute bear:

We could introduce static methods namely _encode_ip_adapter_image() and _prepare_ip_adapter_image_embeds and delegate the current calls of encode_image() and prepare_ip_adapter_image_embeds() to them, respectively. This way, it should be possible for users to not to code encode_image() and prepare_ip_adapter_image_embeds() explicitly like shown above.

So the flow would be like:

from diffusers import StableDiffisionPipeline
from diffusers.models import ImageProjection
import torch
from diffusers.utils import load_image


pipeline = StableDiffisionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
    "cuda"
)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipeline.set_ip_adapter_scale(0.6)

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png"
)
image_embeds = pipeline._prepare_ip_adapter_image_embeds(
    unet=pipeline.unet,
    image_encoder=pipeline.image_encoder,
    feature_extractor=pipeline.feature_extractor,
    ip_adapter_image=image,
    do_classifier_free_guidance=True,
    device="cuda",
    num_images_per_prompt=1,
)
pipeline.unload_ip_adapter()


generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt="best quality, high quality, wearing sunglasses",
    ip_adapter_image_embeds=image_embeds,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("embeds_out.png")

HuggingFaceDocBuilderDev · 2024-02-06T08:18:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks so much for adding this so quickly:) it's looking great!
I left a comment for the unload_ip_adapter portion

src/diffusers/loaders/ip_adapter.py

asomoza · 2024-02-07T02:27:33Z

thank you @sayakpaul, so as I understand it, we need to pass the image embeddings for each IP Adapter which is cool, to be able to mix between an image, list of images or embeddings for each IP Adapter is exactly what I was looking for.

the only other issue left in my list but maybe we can discuss it later and not in this PR, is that diffusers is the only library/app that does the resampling in the forward of the unet instead of when getting the embeddings for the images, this would prevent that we can use embeddings from other apps or libraries and vice versa.

ComfyUI
https://github.com/cubiq/Diffusers_IPAdapter/blob/169add95683d5be6696975375fb70c50795a7947/ip_adapter/ip_adapter.py#L108

InvokeAI
https://github.com/invoke-ai/InvokeAI/blob/79ae9c4e64cb4f64d54c25b1c487501752b8fa84/invokeai/backend/ip_adapter/ip_adapter.py#L138

It would be good to do it here but I remember @yiyixuxu telling me that you were thinking of taking the image projection outside of the unet so maybe we can discuss it then.

sayakpaul · 2024-02-07T02:57:24Z

the only other issue left in my list but maybe we can discuss it later and not in this PR, is that diffusers is the only library/app that does the resampling in the forward of the unet instead of when getting the embeddings for the images, this would prevent that we can use embeddings from other apps or libraries and vice versa.

I don't understand it. What's resampling in this context? Prefer taking references to the diffusers codebase here.

Co-authored-by: YiYi Xu <[email protected]>

asomoza · 2024-02-07T03:24:52Z

Oh sorry, I meant the Image projection, is what is done here in diffusers:

diffusers/src/diffusers/models/unets/unet_2d_condition.py

Line 1077 in 17612de

image_embeds = self.encoder_hid_proj(image_embeds)

sayakpaul · 2024-02-07T03:53:32Z

Oh in that case, that warrants a separate PR / discussion.

sayakpaul · 2024-02-07T04:39:35Z

@yiyixuxu @DN6 I think this is ready for another review.

yiyixuxu · 2024-02-07T20:17:59Z

@asomoza
yes please open another issue:)

The short answer is:

we will not just separate the image projection layer from unet just for ip-adapter - but this is something we are considering for our unet refactor
I think we do not need to separate the image_projection layers in order to accommodate what you need:)

yiyixuxu

looks good to me:) thank you
can we look into that failing test and make sure it is unrelated here?

…call (huggingface#6868) * add: support for passing ip adapter image embeddings * debugging * make feature_extractor unloading conditioned on safety_checker * better condition * type annotation * index to look into value slices * more debugging * debugging * serialize embeddings dict * better conditioning * remove unnecessary prints. * Update src/diffusers/loaders/ip_adapter.py Co-authored-by: YiYi Xu <[email protected]> * make fix-copies and styling. * styling and further copy fixing. * fix: check_inputs call in controlnet sdxl img2img pipeline --------- Co-authored-by: YiYi Xu <[email protected]>

sayakpaul added 10 commits February 6, 2024 10:26

add: support for passing ip adapter image embeddings

f742d61

debugging

4b531c5

make feature_extractor unloading conditioned on safety_checker

9ece513

better condition

2c7a058

type annotation

db4e6e8

index to look into value slices

cfc973d

more debugging

a6b1c05

debugging

ffd2437

serialize embeddings dict

3f4cb30

better conditioning

64bb37d

sayakpaul requested a review from yiyixuxu February 6, 2024 08:11

remove unnecessary prints.

843233e

yiyixuxu mentioned this pull request Feb 6, 2024

IPAdapterTesterMixin #6862

Merged

6 tasks

yiyixuxu reviewed Feb 7, 2024

View reviewed changes

src/diffusers/loaders/ip_adapter.py Outdated Show resolved Hide resolved

sayakpaul and others added 2 commits February 7, 2024 08:29

Update src/diffusers/loaders/ip_adapter.py

a23b63b

Co-authored-by: YiYi Xu <[email protected]>

Merge branch 'main' into feat-ip-image-embeddings

5a5f99e

sayakpaul added 2 commits February 7, 2024 10:00

make fix-copies and styling.

5b10966

styling and further copy fixing.

e194928

yiyixuxu mentioned this pull request Feb 7, 2024

correct unload_ip_adapter: only unload feature_extractor if it's by default None #6822

Closed

yiyixuxu approved these changes Feb 7, 2024

View reviewed changes

sayakpaul added 2 commits February 8, 2024 10:38

Merge branch 'main' into feat-ip-image-embeddings

bddd490

fix: check_inputs call in controlnet sdxl img2img pipeline

d0be723

sayakpaul merged commit aa82df5 into main Feb 8, 2024

sayakpaul deleted the feat-ip-image-embeddings branch February 8, 2024 05:40

yiyixuxu mentioned this pull request Feb 8, 2024

After call IP - Adapter unload method StableDiffusionImg2ImgPipeline(**pipe.components) error #6818

Closed

asomoza mentioned this pull request Feb 9, 2024

IP Adapter Image Embeds - Compatibility with ComfyUI and other libraries/apps #6925

Closed

This was referenced Feb 11, 2024

IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline #6941

Merged

fix IPAdapter unload_ip_adapter test #6972

Merged

asomoza mentioned this pull request Feb 19, 2024

[ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder #7016

Merged

a-r-r-o-w mentioned this pull request Feb 23, 2024

[ip-adapter]fix IP-adapter support for SAG, panorama pipeline #7064

Closed

yiyixuxu mentioned this pull request Mar 9, 2024

.unload_ip_adapter() return error for StableDiffusionControlNetInpaintPipeline #7220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IP Adapters] introduce `ip_adapter_image_embeds` in the SD pipeline call #6868

[IP Adapters] introduce `ip_adapter_image_embeds` in the SD pipeline call #6868

Uh oh!

sayakpaul commented Feb 6, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 6, 2024

Uh oh!

yiyixuxu left a comment

Uh oh!

Uh oh!

asomoza commented Feb 7, 2024 •

edited

Loading

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

asomoza commented Feb 7, 2024

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

yiyixuxu commented Feb 7, 2024 •

edited

Loading

Uh oh!

yiyixuxu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[IP Adapters] introduce ip_adapter_image_embeds in the SD pipeline call #6868

[IP Adapters] introduce ip_adapter_image_embeds in the SD pipeline call #6868

Uh oh!

Conversation

sayakpaul commented Feb 6, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 6, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asomoza commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

asomoza commented Feb 7, 2024

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

sayakpaul commented Feb 7, 2024

Uh oh!

yiyixuxu commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[IP Adapters] introduce `ip_adapter_image_embeds` in the SD pipeline call #6868

[IP Adapters] introduce `ip_adapter_image_embeds` in the SD pipeline call #6868

asomoza commented Feb 7, 2024 •

edited

Loading

yiyixuxu commented Feb 7, 2024 •

edited

Loading