Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Instant ID #2580

Merged
merged 13 commits into from
Jan 27, 2024
Merged

✨ Instant ID #2580

merged 13 commits into from
Jan 27, 2024

Conversation

huchenlei
Copy link
Collaborator

@huchenlei huchenlei commented Jan 25, 2024

Instant ID project

https://github.com/InstantID/InstantID

Instant ID uses a combination of ControlNet and IP-Adapter to control the facial features in the diffusion process. One unique design for Instant ID is that it passes facial embedding from IP-Adapter projection as crossattn input to the ControlNet unet. Normally the crossattn input to the ControlNet unet is prompt's text embedding.
image

Download models

You need to download following models and put them under {A1111_root}/models/ControlNet directory. It is also required to rename models to ip-adapter_instant_id_sdxl and control_instant_id_sdxl so that they can be correctly recognized by the extension.

How to use

InstantID takes 2 models on the UI. You should always set the ipadapter model as first model, as the ControlNet model takes the output from the ipadapter model. (ipadapter model should be hooked first)

Unit 0 Setting

You must set ip-adapter unit right before the ControlNet unit. The projected face embedding output of IP-Adapter unit will be used as part of input to the next ControlNet unit.
1706393054777

Unit 1 Setting

The ControlNet unit accepts a keypoint map of 5 facial keypoints. You are not restricted to use the facial keypoints of the same person you used in Unit 0. Here I use a different person's facial keypoints.
1706393101436

CFG

It is recommended to set CFG 4~5 to get best result. Depending on sampling method and base model this number may vary, but generally you need to use CFG scale a little bit less than normal CFG.

Output

00054-1358917698

Follow-up work

  • Make sd-webui-openpose-editor able to edit the facial keypoints in preprocessor result preview.
  • Currently even if you are using the same face for both model, the insightface preprocessor will run twice. We need to find a way to cache the result and only run the model once.
  • Support multiple face inputs.

Note

As the insightface's github release currently do not have antelopev2 model, we are downloading from a huggingface mirror https://huggingface.co/DIAMONIK7777/antelopev2. If you are in mainland China and don't have good internet connection to huggingface, you can manually download the model from somewhere else and place them under extensions/sd-webui-controlnet/annotators/downloads/insightface/models/antelopev2.

@huchenlei huchenlei requested a review from sdbds January 26, 2024 03:19
@huchenlei huchenlei marked this pull request as ready for review January 26, 2024 03:19
Comment on lines +390 to +405
@torch.inference_mode()
def get_image_embeds_instantid(self, prompt_image_emb):
"""Get image embeds for instantid."""
image_proj_model_in_features = 512
if isinstance(prompt_image_emb, torch.Tensor):
prompt_image_emb = prompt_image_emb.clone().detach()
else:
prompt_image_emb = torch.tensor(prompt_image_emb)

prompt_image_emb = prompt_image_emb.to(device=self.device, dtype=torch.float32)
prompt_image_emb = prompt_image_emb.reshape([1, -1, image_proj_model_in_features])
return (
self.image_proj_model(prompt_image_emb),
self.image_proj_model(torch.zeros_like(prompt_image_emb)),
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step2: Calculate projected face embedding with ipadapter weights.

Comment on lines 743 to 781
def run_model_instant_id(self, img: np.ndarray, **kwargs):
"""Run the model for instant_id."""
def draw_kps(img: np.ndarray, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]):
stickwidth = 4
limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
kps = np.array(kps)

h, w, _ = img.shape
out_img = np.zeros([h, w, 3])

for i in range(len(limbSeq)):
index = limbSeq[i]
color = color_list[index[0]]

x = kps[index][:, 0]
y = kps[index][:, 1]
length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
out_img = (out_img * 0.6).astype(np.uint8)

for idx_kp, kp in enumerate(kps):
color = color_list[idx_kp]
x, y = kp
out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)

return out_img.astype(np.uint8)

self.load_model()
face_info = self.model.get(img)
if not face_info:
raise Exception(f"Insightface: No face found in image.")
if len(face_info) > 1:
logger.warn("Insightface: More than one face is detected in the image. "
f"Only the first one will be used.")
# only use the maximum face
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1]
return RawInstantIdInput(draw_kps(img, face_info['kps']), face_info['embedding']), False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step1: Accepting raw inputs, returns faceid processed results.

Comment on lines 957 to 962
elif control_model_type == ControlModelType.InstantID:
assert isinstance(detected_map, tuple)
raw_input = detected_map
resized_keypoints, detected_map = Script.detectmap_proc(raw_input.keypoints, unit.module, resize_mode, h, w)
control = ResizedInstantIdInput(resized_keypoints, raw_input.embedding)
store_detected_map(detected_map, unit.module)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step3: Keypoints map get resized to generation target's height/width.

Comment on lines 1100 to 1112
if param.control_model_type == ControlModelType.InstantID:
# For instant_id we always expect ip-adapter model followed
# by ControlNet model.
assert i > 0, "InstantID control model should follow ipadapter model."
ip_adapter_param = forward_params[i - 1]
assert ip_adapter_param.control_model_type == ControlModelType.IPAdapter, \
"InstantID control model should follow ipadapter model."
control_model = ip_adapter_param.control_model
assert hasattr(control_model, "image_emb")
param.hint_cond = InstantIdInput(
param.hint_cond.resized_keypoints,
control_model.image_emb,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step4: Pass projected face embedding to ControlNet.

scripts/hook.py Outdated
Comment on lines 566 to 570
# Unpack inputs for InstantID.
if param.control_model_type == ControlModelType.InstantID:
assert isinstance(hint, InstantIdInput)
context = hint.projected_embedding.eval(cond_mark).to(x.device, dtype=x.dtype)
hint = hint.resized_keypoints.to(x.device, dtype=x.dtype)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step5: Set control image(hint) and crossattn cond(context) for ControlNet.

@erhan-
Copy link

erhan- commented Jan 27, 2024

Testing this.

First run:

        raise RuntimeError("Failed downloading url %s" % url)
    RuntimeError: Failed downloading url https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip

deepinsight/insightface#1896 (comment)

Downloaded and tried to put under

.\extensions\sd-webui-controlnet\annotator\downloads\insightface\models

grafik

@hablaba
Copy link

hablaba commented Jan 27, 2024

This is awesome! I’ve played with InstantID in diffusers and it has a lot of potential.

I’m also testing out your PR. I did notice the examples in InstantID results use super low CFG at 3.5. Higher and I get pretty bad results. But yes something seems a bit off in the implementation because I don’t get great likeness.

Two other notes - I think there needs to be a way to adjust control net scale and ip adapter scale separately. Often times you need to tweak them independently. Another really cool opportunity would be to allow a secondary pose image. If it’s not included the pose image would default to the face image. That would let you generate with different angles/poses.

Great work! Appreciate you working on this so quick. Happy to help beta test some more

@beansfotos
Copy link

Does this work? Will it be included in a future update or do we have to manually install?

@huchenlei
Copy link
Collaborator Author

This is awesome! I’ve played with InstantID in diffusers and it has a lot of potential.

I’m also testing out your PR. I did notice the examples in InstantID results use super low CFG at 3.5. Higher and I get pretty bad results. But yes something seems a bit off in the implementation because I don’t get great likeness.

Two other notes - I think there needs to be a way to adjust control net scale and ip adapter scale separately. Often times you need to tweak them independently. Another really cool opportunity would be to allow a secondary pose image. If it’s not included the pose image would default to the face image. That would let you generate with different angles/poses.

Great work! Appreciate you working on this so quick. Happy to help beta test some more

Thanks for your testing! I have made insightface into 2 separate units. Now you can adjust weight for each model and optionally you can pass a custom facial landmark now.

@huchenlei huchenlei merged commit 9473a77 into Mikubill:main Jan 27, 2024
1 check passed
@hablaba
Copy link

hablaba commented Jan 28, 2024

Thanks for your testing! I have made insightface into 2 separate units. Now you can adjust weight for each model and optionally you can pass a custom facial landmark now.

I tested your changes and it seems to be working great, including with different face image and pose image!

Really appreciate your work here and I’m a bit shocked at how fast you got it implemented.

The low CFG requirement is still a odd quirk but that seems to really be an InstantID issue. Maybe they’ll fix it in a future model

@aminesoulaymani
Copy link

Amazing, I spent hours trying to deal with OOM's using comfyui even on 768768px with poor results (I5, Geforce 3060 6Gb Vram), after the sd-webui-controlnet update, automatic111 is able to produce 10241024 near-perfect results, no OOM's at all, no "Low vram" checked, everything smooth and nice. The results are better than any Lora I trained for days, about 3 minutes with your extension, you MVP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants