✨ Instant ID #2580

huchenlei · 2024-01-25T04:02:21Z

Instant ID project

https://github.com/InstantID/InstantID

Instant ID uses a combination of ControlNet and IP-Adapter to control the facial features in the diffusion process. One unique design for Instant ID is that it passes facial embedding from IP-Adapter projection as crossattn input to the ControlNet unet. Normally the crossattn input to the ControlNet unet is prompt's text embedding.

Download models

You need to download following models and put them under {A1111_root}/models/ControlNet directory. It is also required to rename models to ip-adapter_instant_id_sdxl and control_instant_id_sdxl so that they can be correctly recognized by the extension.

How to use

InstantID takes 2 models on the UI. You should always set the ipadapter model as first model, as the ControlNet model takes the output from the ipadapter model. (ipadapter model should be hooked first)

Unit 0 Setting

You must set ip-adapter unit right before the ControlNet unit. The projected face embedding output of IP-Adapter unit will be used as part of input to the next ControlNet unit.

Unit 1 Setting

The ControlNet unit accepts a keypoint map of 5 facial keypoints. You are not restricted to use the facial keypoints of the same person you used in Unit 0. Here I use a different person's facial keypoints.

CFG

It is recommended to set CFG 4~5 to get best result. Depending on sampling method and base model this number may vary, but generally you need to use CFG scale a little bit less than normal CFG.

Output

Follow-up work

Make sd-webui-openpose-editor able to edit the facial keypoints in preprocessor result preview.
Currently even if you are using the same face for both model, the insightface preprocessor will run twice. We need to find a way to cache the result and only run the model once.
Support multiple face inputs.

Note

As the insightface's github release currently do not have antelopev2 model, we are downloading from a huggingface mirror https://huggingface.co/DIAMONIK7777/antelopev2. If you are in mainland China and don't have good internet connection to huggingface, you can manually download the model from somewhere else and place them under extensions/sd-webui-controlnet/annotators/downloads/insightface/models/antelopev2.

huchenlei · 2024-01-27T05:01:01Z

scripts/controlmodel_ipadapter.py

+    @torch.inference_mode()
+    def get_image_embeds_instantid(self, prompt_image_emb):
+        """Get image embeds for instantid."""
+        image_proj_model_in_features = 512
+        if isinstance(prompt_image_emb, torch.Tensor):
+            prompt_image_emb = prompt_image_emb.clone().detach()
+        else:
+            prompt_image_emb = torch.tensor(prompt_image_emb)
+
+        prompt_image_emb = prompt_image_emb.to(device=self.device, dtype=torch.float32)
+        prompt_image_emb = prompt_image_emb.reshape([1, -1, image_proj_model_in_features])
+        return (
+            self.image_proj_model(prompt_image_emb),
+            self.image_proj_model(torch.zeros_like(prompt_image_emb)),
+        )
+


Step2: Calculate projected face embedding with ipadapter weights.

huchenlei · 2024-01-27T05:01:21Z

scripts/processor.py

+    def run_model_instant_id(self, img: np.ndarray, **kwargs):
+        """Run the model for instant_id."""
+        def draw_kps(img: np.ndarray, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]):
+            stickwidth = 4
+            limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
+            kps = np.array(kps)
+
+            h, w, _ = img.shape
+            out_img = np.zeros([h, w, 3])
+
+            for i in range(len(limbSeq)):
+                index = limbSeq[i]
+                color = color_list[index[0]]
+
+                x = kps[index][:, 0]
+                y = kps[index][:, 1]
+                length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
+                angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
+                polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
+                out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
+            out_img = (out_img * 0.6).astype(np.uint8)
+
+            for idx_kp, kp in enumerate(kps):
+                color = color_list[idx_kp]
+                x, y = kp
+                out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)
+
+            return out_img.astype(np.uint8)
+
+        self.load_model()
+        face_info = self.model.get(img)
+        if not face_info:
+            raise Exception(f"Insightface: No face found in image.")
+        if len(face_info) > 1:
+            logger.warn("Insightface: More than one face is detected in the image. "
+                        f"Only the first one will be used.")
+        # only use the maximum face
+        face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1]
+        return RawInstantIdInput(draw_kps(img, face_info['kps']), face_info['embedding']), False


Step1: Accepting raw inputs, returns faceid processed results.

huchenlei · 2024-01-27T05:01:44Z

scripts/controlnet.py

+            elif control_model_type == ControlModelType.InstantID:
+                assert isinstance(detected_map, tuple)
+                raw_input = detected_map
+                resized_keypoints, detected_map = Script.detectmap_proc(raw_input.keypoints, unit.module, resize_mode, h, w)
+                control = ResizedInstantIdInput(resized_keypoints, raw_input.embedding)
+                store_detected_map(detected_map, unit.module)


Step3: Keypoints map get resized to generation target's height/width.

huchenlei · 2024-01-27T05:02:35Z

scripts/controlnet.py

+            if param.control_model_type == ControlModelType.InstantID:
+                # For instant_id we always expect ip-adapter model followed
+                # by ControlNet model.
+                assert i > 0, "InstantID control model should follow ipadapter model."
+                ip_adapter_param = forward_params[i - 1]
+                assert ip_adapter_param.control_model_type == ControlModelType.IPAdapter, \
+                        "InstantID control model should follow ipadapter model."
+                control_model = ip_adapter_param.control_model
+                assert hasattr(control_model, "image_emb")
+                param.hint_cond = InstantIdInput(
+                    param.hint_cond.resized_keypoints,
+                    control_model.image_emb,
+                )


Step4: Pass projected face embedding to ControlNet.

huchenlei · 2024-01-27T05:03:16Z

scripts/hook.py

+                # Unpack inputs for InstantID.
+                if param.control_model_type == ControlModelType.InstantID:
+                    assert isinstance(hint, InstantIdInput)
+                    context = hint.projected_embedding.eval(cond_mark).to(x.device, dtype=x.dtype)
+                    hint = hint.resized_keypoints.to(x.device, dtype=x.dtype)


Step5: Set control image(hint) and crossattn cond(context) for ControlNet.

erhan- · 2024-01-27T10:15:20Z

Testing this.

First run:

        raise RuntimeError("Failed downloading url %s" % url)
    RuntimeError: Failed downloading url https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip

deepinsight/insightface#1896 (comment)

Downloaded and tried to put under

.\extensions\sd-webui-controlnet\annotator\downloads\insightface\models

hablaba · 2024-01-27T18:32:34Z

This is awesome! I’ve played with InstantID in diffusers and it has a lot of potential.

I’m also testing out your PR. I did notice the examples in InstantID results use super low CFG at 3.5. Higher and I get pretty bad results. But yes something seems a bit off in the implementation because I don’t get great likeness.

Two other notes - I think there needs to be a way to adjust control net scale and ip adapter scale separately. Often times you need to tweak them independently. Another really cool opportunity would be to allow a secondary pose image. If it’s not included the pose image would default to the face image. That would let you generate with different angles/poses.

Great work! Appreciate you working on this so quick. Happy to help beta test some more

beansfotos · 2024-01-27T20:09:33Z

Does this work? Will it be included in a future update or do we have to manually install?

huchenlei · 2024-01-27T22:22:58Z

This is awesome! I’ve played with InstantID in diffusers and it has a lot of potential.

I’m also testing out your PR. I did notice the examples in InstantID results use super low CFG at 3.5. Higher and I get pretty bad results. But yes something seems a bit off in the implementation because I don’t get great likeness.

Two other notes - I think there needs to be a way to adjust control net scale and ip adapter scale separately. Often times you need to tweak them independently. Another really cool opportunity would be to allow a secondary pose image. If it’s not included the pose image would default to the face image. That would let you generate with different angles/poses.

Great work! Appreciate you working on this so quick. Happy to help beta test some more

Thanks for your testing! I have made insightface into 2 separate units. Now you can adjust weight for each model and optionally you can pass a custom facial landmark now.

hablaba · 2024-01-28T06:02:53Z

Thanks for your testing! I have made insightface into 2 separate units. Now you can adjust weight for each model and optionally you can pass a custom facial landmark now.

I tested your changes and it seems to be working great, including with different face image and pose image!

Really appreciate your work here and I’m a bit shocked at how fast you got it implemented.

The low CFG requirement is still a odd quirk but that seems to really be an InstantID issue. Maybe they’ll fix it in a future model

aminesoulaymani · 2024-01-29T18:04:32Z

Amazing, I spent hours trying to deal with OOM's using comfyui even on 768768px with poor results (I5, Geforce 3060 6Gb Vram), after the sd-webui-controlnet update, automatic111 is able to produce 10241024 near-perfect results, no OOM's at all, no "Low vram" checked, everything smooth and nice. The results are better than any Lora I trained for days, about 3 minutes with your extension, you MVP

xiaohu2015 mentioned this pull request Jan 25, 2024

Discuss anything here instantX-research/InstantID#6

Open

huchenlei added 7 commits January 25, 2024 19:42

instant id

0597780

Add extra control type enum

7336ddc

wip

2bfd766

wip

ce97c6e

Fix all obvious issues

9060898

Fix insightface model issue

37af5ee

append keypoints images

d7e3166

huchenlei force-pushed the instant_id branch from 1621596 to d7e3166 Compare January 26, 2024 02:59

huchenlei requested a review from sdbds January 26, 2024 03:19

huchenlei marked this pull request as ready for review January 26, 2024 03:19

ui

d887459

This was referenced Jan 27, 2024

Can the model be used in controlnet automatic 1111? instantX-research/InstantID#28

Open

Looking for proof-read/verification on A1111 implementation instantX-research/InstantID#84

Closed

huchenlei commented Jan 27, 2024

View reviewed changes

huchenlei added 3 commits January 27, 2024 13:55

Fix res

bf3fc52

Remove model2

95f4a58

separate unit control

32cc76e

huchenlei added 2 commits January 27, 2024 16:23

Fix context issue

592b2b0

insightface model download

241cfed

huchenlei merged commit 9473a77 into Mikubill:main Jan 27, 2024
1 check passed

hkunzhe mentioned this pull request Jan 30, 2024

Feature: Support InstantID aigc-apps/sd-webui-EasyPhoto#386

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Instant ID #2580

✨ Instant ID #2580

huchenlei commented Jan 25, 2024 •

edited

Loading

huchenlei Jan 27, 2024

huchenlei Jan 27, 2024

huchenlei Jan 27, 2024

huchenlei Jan 27, 2024

huchenlei Jan 27, 2024

erhan- commented Jan 27, 2024 •

edited

Loading

hablaba commented Jan 27, 2024

beansfotos commented Jan 27, 2024

huchenlei commented Jan 27, 2024

hablaba commented Jan 28, 2024

aminesoulaymani commented Jan 29, 2024

✨ Instant ID #2580

✨ Instant ID #2580

Conversation

huchenlei commented Jan 25, 2024 • edited Loading

Instant ID project

Download models

How to use

Unit 0 Setting

Unit 1 Setting

CFG

Output

Follow-up work

Note

huchenlei Jan 27, 2024

Choose a reason for hiding this comment

huchenlei Jan 27, 2024

Choose a reason for hiding this comment

huchenlei Jan 27, 2024

Choose a reason for hiding this comment

huchenlei Jan 27, 2024

Choose a reason for hiding this comment

huchenlei Jan 27, 2024

Choose a reason for hiding this comment

erhan- commented Jan 27, 2024 • edited Loading

hablaba commented Jan 27, 2024

beansfotos commented Jan 27, 2024

huchenlei commented Jan 27, 2024

hablaba commented Jan 28, 2024

aminesoulaymani commented Jan 29, 2024

huchenlei commented Jan 25, 2024 •

edited

Loading

erhan- commented Jan 27, 2024 •

edited

Loading