-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Instant ID #2580
✨ Instant ID #2580
Conversation
@torch.inference_mode() | ||
def get_image_embeds_instantid(self, prompt_image_emb): | ||
"""Get image embeds for instantid.""" | ||
image_proj_model_in_features = 512 | ||
if isinstance(prompt_image_emb, torch.Tensor): | ||
prompt_image_emb = prompt_image_emb.clone().detach() | ||
else: | ||
prompt_image_emb = torch.tensor(prompt_image_emb) | ||
|
||
prompt_image_emb = prompt_image_emb.to(device=self.device, dtype=torch.float32) | ||
prompt_image_emb = prompt_image_emb.reshape([1, -1, image_proj_model_in_features]) | ||
return ( | ||
self.image_proj_model(prompt_image_emb), | ||
self.image_proj_model(torch.zeros_like(prompt_image_emb)), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step2: Calculate projected face embedding with ipadapter weights.
scripts/processor.py
Outdated
def run_model_instant_id(self, img: np.ndarray, **kwargs): | ||
"""Run the model for instant_id.""" | ||
def draw_kps(img: np.ndarray, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]): | ||
stickwidth = 4 | ||
limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]]) | ||
kps = np.array(kps) | ||
|
||
h, w, _ = img.shape | ||
out_img = np.zeros([h, w, 3]) | ||
|
||
for i in range(len(limbSeq)): | ||
index = limbSeq[i] | ||
color = color_list[index[0]] | ||
|
||
x = kps[index][:, 0] | ||
y = kps[index][:, 1] | ||
length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5 | ||
angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1])) | ||
polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1) | ||
out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color) | ||
out_img = (out_img * 0.6).astype(np.uint8) | ||
|
||
for idx_kp, kp in enumerate(kps): | ||
color = color_list[idx_kp] | ||
x, y = kp | ||
out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1) | ||
|
||
return out_img.astype(np.uint8) | ||
|
||
self.load_model() | ||
face_info = self.model.get(img) | ||
if not face_info: | ||
raise Exception(f"Insightface: No face found in image.") | ||
if len(face_info) > 1: | ||
logger.warn("Insightface: More than one face is detected in the image. " | ||
f"Only the first one will be used.") | ||
# only use the maximum face | ||
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1] | ||
return RawInstantIdInput(draw_kps(img, face_info['kps']), face_info['embedding']), False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step1: Accepting raw inputs, returns faceid processed results.
scripts/controlnet.py
Outdated
elif control_model_type == ControlModelType.InstantID: | ||
assert isinstance(detected_map, tuple) | ||
raw_input = detected_map | ||
resized_keypoints, detected_map = Script.detectmap_proc(raw_input.keypoints, unit.module, resize_mode, h, w) | ||
control = ResizedInstantIdInput(resized_keypoints, raw_input.embedding) | ||
store_detected_map(detected_map, unit.module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step3: Keypoints map get resized to generation target's height/width.
scripts/controlnet.py
Outdated
if param.control_model_type == ControlModelType.InstantID: | ||
# For instant_id we always expect ip-adapter model followed | ||
# by ControlNet model. | ||
assert i > 0, "InstantID control model should follow ipadapter model." | ||
ip_adapter_param = forward_params[i - 1] | ||
assert ip_adapter_param.control_model_type == ControlModelType.IPAdapter, \ | ||
"InstantID control model should follow ipadapter model." | ||
control_model = ip_adapter_param.control_model | ||
assert hasattr(control_model, "image_emb") | ||
param.hint_cond = InstantIdInput( | ||
param.hint_cond.resized_keypoints, | ||
control_model.image_emb, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step4: Pass projected face embedding to ControlNet.
scripts/hook.py
Outdated
# Unpack inputs for InstantID. | ||
if param.control_model_type == ControlModelType.InstantID: | ||
assert isinstance(hint, InstantIdInput) | ||
context = hint.projected_embedding.eval(cond_mark).to(x.device, dtype=x.dtype) | ||
hint = hint.resized_keypoints.to(x.device, dtype=x.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step5: Set control image(hint) and crossattn cond(context) for ControlNet.
Testing this. First run:
deepinsight/insightface#1896 (comment) Downloaded and tried to put under
|
This is awesome! I’ve played with InstantID in diffusers and it has a lot of potential. I’m also testing out your PR. I did notice the examples in InstantID results use super low CFG at 3.5. Higher and I get pretty bad results. But yes something seems a bit off in the implementation because I don’t get great likeness. Two other notes - I think there needs to be a way to adjust control net scale and ip adapter scale separately. Often times you need to tweak them independently. Another really cool opportunity would be to allow a secondary pose image. If it’s not included the pose image would default to the face image. That would let you generate with different angles/poses. Great work! Appreciate you working on this so quick. Happy to help beta test some more |
Does this work? Will it be included in a future update or do we have to manually install? |
Thanks for your testing! I have made insightface into 2 separate units. Now you can adjust weight for each model and optionally you can pass a custom facial landmark now. |
I tested your changes and it seems to be working great, including with different face image and pose image! Really appreciate your work here and I’m a bit shocked at how fast you got it implemented. The low CFG requirement is still a odd quirk but that seems to really be an InstantID issue. Maybe they’ll fix it in a future model |
Amazing, I spent hours trying to deal with OOM's using comfyui even on 768768px with poor results (I5, Geforce 3060 6Gb Vram), after the sd-webui-controlnet update, automatic111 is able to produce 10241024 near-perfect results, no OOM's at all, no "Low vram" checked, everything smooth and nice. The results are better than any Lora I trained for days, about 3 minutes with your extension, you MVP |
Instant ID project
https://github.com/InstantID/InstantID
Instant ID uses a combination of ControlNet and IP-Adapter to control the facial features in the diffusion process. One unique design for Instant ID is that it passes facial embedding from IP-Adapter projection as crossattn input to the ControlNet unet. Normally the crossattn input to the ControlNet unet is prompt's text embedding.
Download models
You need to download following models and put them under
{A1111_root}/models/ControlNet
directory. It is also required to rename models toip-adapter_instant_id_sdxl
andcontrol_instant_id_sdxl
so that they can be correctly recognized by the extension.How to use
InstantID takes 2 models on the UI. You should always set the ipadapter model as first model, as the ControlNet model takes the output from the ipadapter model. (ipadapter model should be hooked first)
Unit 0 Setting
You must set ip-adapter unit right before the ControlNet unit. The projected face embedding output of IP-Adapter unit will be used as part of input to the next ControlNet unit.
Unit 1 Setting
The ControlNet unit accepts a keypoint map of 5 facial keypoints. You are not restricted to use the facial keypoints of the same person you used in Unit 0. Here I use a different person's facial keypoints.
CFG
It is recommended to set CFG 4~5 to get best result. Depending on sampling method and base model this number may vary, but generally you need to use CFG scale a little bit less than normal CFG.
Output
Follow-up work
Note
As the insightface's github release currently do not have antelopev2 model, we are downloading from a huggingface mirror https://huggingface.co/DIAMONIK7777/antelopev2. If you are in mainland China and don't have good internet connection to huggingface, you can manually download the model from somewhere else and place them under
extensions/sd-webui-controlnet/annotators/downloads/insightface/models/antelopev2
.