refactor prepare_mask_and_masked_image with VaeImageProcessor #4444

yiyixuxu · 2023-08-03T01:30:31Z

first attempt to refactor inpainting pipelines with VaeImageProcessor
lots of tests need to be added

This example work as expected after refactoring

import PIL
import requests
import torch
from io import BytesIO

from diffusers import StableDiffusionInpaintPipeline


def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")


img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

generator = torch.Generator(device="cuda").manual_seed(0)

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
image.save("yellow_cat.png")

HuggingFaceDocBuilderDev · 2023-08-03T01:39:37Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-08-03T19:35:43Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

+        mask[mask < 0.5] = 0
+        mask[mask > 0.5] = 1


Suggested change

mask[mask < 0.5] = 0

mask[mask > 0.5] = 1

mask[mask < 0.5] = 0

mask[mask > 0.5] = 1

shouldn't we do this in the preprocess function?

patrickvonplaten · 2023-08-03T19:36:01Z

src/diffusers/image_processor.py

@@ -52,8 +52,16 @@ def __init__(
        resample: str = "lanczos",
        do_normalize: bool = True,
        do_convert_rgb: bool = False,
+        do_convert_grayscale: bool = False,


patrickvonplaten · 2023-08-03T19:36:20Z

src/diffusers/image_processor.py

    ):
        super().__init__()
+        if do_convert_rgb and do_convert_grayscale:
+            warnings.warn(


maybe better to throw an error here actually

patrickvonplaten · 2023-08-03T19:36:56Z

Very nice first draft! cc @sayakpaul @pcuenca @williamberman here for a review as well

sayakpaul · 2023-08-04T08:15:23Z

src/diffusers/image_processor.py

+            raise ValueError(
+                "`do_convert_rgb` and `do_convert_grayscale` can not both be set to `True`,"
+                " if you intended to convert the image into RGB format, please set `do_convert_grayscale = False`.",
+                " if you intended to convert the image into grayscale format, please set `do_convert_rgb = False`",
+            )


Very descriptive! Looks good.

sayakpaul · 2023-08-04T08:16:34Z

src/diffusers/image_processor.py

+            image(`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`):
+                the image input, can be a PIL image, numpy array or pytorch tensor. if it is a numpy array, should have
+                shape [batch, height, width] or [batch, height, width, channel] if it is a pytorch tensor, should have
+                shape [batch, channel, height, width]
+            height (`int`, *optional*, defaults to `None`):
+                The height in preprocessed image. If `None`, will use the height of `image` input
+            width (`int`, *optional*`, defaults to `None`):
+                The width in preprocessed. If `None`, will use the width of the `image` input


Let's maintain casing:

"the" -> "The" when it's being placed at the beginning.

Fullstops to end the sentences.

sayakpaul · 2023-08-04T08:17:35Z

src/diffusers/image_processor.py

+        image[image < 0.5] = 0
+        image[image >= 0.5] = 1
+        return image


Should 0 and 1 be registered into the config vars?

Think 0 and 1 is global enough to not have to be added to the config

sayakpaul · 2023-08-04T08:19:22Z

src/diffusers/image_processor.py

+            if isinstance(image, torch.Tensor):
+                # if image is a pytorch tensor could have 2 possible shapes:
+                #    1. batch x height x width: we should insert the channel dimension at position 1
+                #    2. channnel x height x width: we should insert batch dimension at position 0,
+                #       however, since both channel and batch dimension has same size 1, it is same to insert at position 1
+                #    for simplicity, we insert a dimension of size 1 at position 1 for both cases
+                image = image.unsqueeze(1)
+            else:
+                # if it is a numpy array, it could have 2 possible shapes:
+                #   1. batch x height x width: insert channel dimension on last position
+                #   2. height x width x channel: insert batch dimension on first position
+                if image.shape[-1] == 1:
+                    image = np.expand_dims(image, axis=0)
+                else:
+                    image = np.expand_dims(image, axis=-1)
+


For easier operability, does it make sense to first convert the input tensor to a NumPy array and then operate from there?

@sayakpaul

the output of preprocess is pytorch tensors, why would we want to convert to numpy array first?

the reason we want to this step here is because rest of our preprocessing logic assumes a 3D tensor has shape channel x height x width, which doesn't apply to grayscale images

for example if we have a tensor with shape (5, 256,256), if we try to process with current logic directly, we will do this:
(5, 256,256) -> put it into a list :[ (5,256,256)] -> torch.stack the list when not 4d: (1, 5, 256, 256)

the correct would be (5, 1, 256, 256)

Why cannot we add a dummy channel to represent the grayscale images then? Just trying to understand it better.

@sayakpaul
yeah and that's what this section of code is doing:
Adding a dummy channel to represent either a missing channel or batch dimension to remove the ambiguity

sayakpaul · 2023-08-04T08:20:18Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

+
+        mask = self.mask_processor.preprocess(mask_image, height=height, width=width)
+
+        masked_image = init_image * (mask < 0.5)


Register 0.5 as a config var?

Do you mean of the init of the VAEProcessor or the vae model config ? IMO it's general enough to not have to be registered in a config

sayakpaul

Looking quite good.

Let's try to think of a non-exhaustive list of the scenarios we need to think of to ensure this refactor is robust and write test cases for each of them.

yiyixuxu · 2023-08-05T03:15:50Z

@patrickvonplaten
should we unify the default behavior for when the user does not pass the height and width arguments?

in SD inpaint pipeline, when height and width is None, it will default to sample_size * vae scale factor;
in controlnet inpaint, it will resize to the closest multiple of 8, which I think makes much more sense

https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py#L866

yiyixuxu · 2023-08-05T03:42:01Z

src/diffusers/image_processor.py

                )

        # expected range [0,1], normalize to [-1,1]
        do_normalize = self.config.do_normalize
-        if image.min() < 0:
+        if image.min() < 0 and do_normalize:


if user configured the image_processor to have do_normalize=False, the expected range should be [-1,1] and we shouldn't send a warning (this is the case for controlnet image)

patrickvonplaten · 2023-08-23T17:58:09Z

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py

@@ -133,7 +133,11 @@ def prepare_mask_and_masked_image(image, mask, height, width, return_image=False
        tuple[torch.Tensor]: The pair (mask, masked_image) as ``torch.Tensor`` with 4
            dimensions: ``batch x channels x height x width``.
    """
-
+    warnings.warn(


Let's please use the deprecate function here:

diffusers/src/diffusers/utils/deprecation_utils.py

Line 8 in 124e76d

def deprecate(*args, take_from: Optional[Union[Dict, Any]] = None, standard_warn=True, stacklevel=2):

patrickvonplaten · 2023-08-23T17:58:32Z

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py

+        image: Union[
+            torch.FloatTensor,
+            PIL.Image.Image,
+            np.ndarray,
+            List[torch.FloatTensor],
+            List[PIL.Image.Image],
+            List[np.ndarray],
+        ] = None,


Suggested change

image: Union[

torch.FloatTensor,

PIL.Image.Image,

np.ndarray,

List[torch.FloatTensor],

List[PIL.Image.Image],

List[np.ndarray],

] = None,

image: Union[

torch.FloatTensor,

PIL.Image.Image,

np.ndarray,

List[torch.FloatTensor],

List[PIL.Image.Image],

List[np.ndarray],

] = None,

might be worth to define a new type here actually

Something like:

PipelineImage = Union[ torch.FloatTensor, PIL.Image.Image, np.ndarray, List[torch.FloatTensor], List[PIL.Image.Image], List[np.ndarray], ]

and use this

patrickvonplaten

Looks cool! Can we apply the changes also right away to SDXL and SDXL controlnet?

yiyixuxu · 2023-08-24T03:43:03Z

@patrickvonplaten

Can we apply the changes also right away to SDXL and SDXL controlnet?

updated SDXL-inpainting

SDXL controlnet does not have masks input, and we already used a image_processor to process the control condition:) we've added Image_processor to all the other SDXL pipelines wherever there is image input

yiyixuxu · 2023-08-24T04:41:09Z

don't know why this test failed - I can't reproduce it on my machine; also this PR did not touch the paint_by_example pipeline at all
https://github.com/huggingface/diffusers/actions/runs/5959195493/job/16165075058?pr=4444#step:6:13213

patrickvonplaten

Very nice! Let's merge it before we run into ugly merge conflicts :-)

…gface#4444) * refactor image processor for mask --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>

yiyixuxu added 5 commits August 2, 2023 22:27

refactor image processor for mask

a3c8c1c

deprecate the prepare_mask_and_masked_image function

a6bffca

refactor inpaint

11328e5

fix

520dd47

make style

6e1d59c

yiyixuxu added 2 commits August 3, 2023 02:24

fix

84f7037

improve docstring

4e46ea1

patrickvonplaten reviewed Aug 3, 2023

View reviewed changes

add do_binarize and warning -> error

3f5e046

sayakpaul reviewed Aug 4, 2023

View reviewed changes

add tests

ccbfcab

sayakpaul reviewed Aug 4, 2023

View reviewed changes

yiyixuxu added 3 commits August 4, 2023 21:25

apply feedback

6aa4114

fix copies

8d7c091

docstring

0f09e72

yiyixuxu added 2 commits August 5, 2023 03:31

refactor controlnet inpaint

5a86b88

style

12cf87e

yiyixuxu commented Aug 5, 2023

View reviewed changes

fix copies

05ab579

patrickvonplaten reviewed Aug 23, 2023

View reviewed changes

yiyixuxu added 8 commits August 24, 2023 01:03

add ImageInput type

b813ee6

fix

8a225bc

warning -> deprecate

7c920ac

fix

317b130

refator sdxl-inpaint

280709f

fix image latent

51442dc

style

c24c2ad

fix

c171912

Merge remote-tracking branch 'origin/main' into inpaint-preprocess

d426dc1

patrickvonplaten approved these changes Aug 25, 2023

View reviewed changes

yiyixuxu merged commit b7b1a30 into main Aug 25, 2023

yiyixuxu deleted the inpaint-preprocess branch August 25, 2023 18:18

yiyixuxu mentioned this pull request Aug 25, 2023

[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL #4694

Merged

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023

refactor prepare_mask_and_masked_image with VaeImageProcessor (huggin…

65c615f

…gface#4444) * refactor image processor for mask --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor prepare_mask_and_masked_image with VaeImageProcessor #4444

refactor prepare_mask_and_masked_image with VaeImageProcessor #4444

yiyixuxu commented Aug 3, 2023

HuggingFaceDocBuilderDev commented Aug 3, 2023 •

edited

Loading

patrickvonplaten Aug 3, 2023

patrickvonplaten Aug 3, 2023

patrickvonplaten Aug 3, 2023

patrickvonplaten commented Aug 3, 2023

sayakpaul Aug 4, 2023

sayakpaul Aug 4, 2023

sayakpaul Aug 4, 2023

patrickvonplaten Aug 7, 2023

sayakpaul Aug 4, 2023

yiyixuxu Aug 4, 2023 •

edited

Loading

sayakpaul Aug 4, 2023

yiyixuxu Aug 4, 2023

sayakpaul Aug 4, 2023

patrickvonplaten Aug 4, 2023

sayakpaul left a comment

yiyixuxu commented Aug 5, 2023

yiyixuxu Aug 5, 2023

patrickvonplaten Aug 23, 2023

patrickvonplaten Aug 23, 2023

patrickvonplaten Aug 23, 2023

patrickvonplaten left a comment

yiyixuxu commented Aug 24, 2023 •

edited

Loading

yiyixuxu commented Aug 24, 2023 •

edited

Loading

patrickvonplaten left a comment


		mask = self.mask_processor.preprocess(mask_image, height=height, width=width)

		masked_image = init_image * (mask < 0.5)

refactor prepare_mask_and_masked_image with VaeImageProcessor #4444

refactor prepare_mask_and_masked_image with VaeImageProcessor #4444

Conversation

yiyixuxu commented Aug 3, 2023

HuggingFaceDocBuilderDev commented Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Aug 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiyixuxu Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

yiyixuxu commented Aug 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

yiyixuxu commented Aug 24, 2023 • edited Loading

yiyixuxu commented Aug 24, 2023 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 3, 2023 •

edited

Loading

yiyixuxu Aug 4, 2023 •

edited

Loading

yiyixuxu commented Aug 24, 2023 •

edited

Loading

yiyixuxu commented Aug 24, 2023 •

edited

Loading