You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/using-diffusers/ip_adapter.md
+9-14Lines changed: 9 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,16 +12,18 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# IP-Adapter
14
14
15
-
[IP-Adapter](https://hf.co/papers/2308.06721) is an image prompt adapter that can be plugged into diffusion models to enable image prompting without any changes to the underlying model. Furthermore, this adapter can be reused with other models finetuned from the same base model and it can be combined with other adapters like ControlNet. The key idea behind IP-Adapter is the *decoupled cross-attention* mechanism which adds a separate cross-attention layer just for image features instead of using the same cross-attention layer for both text and image features. This allows the model to learn more image-specific features.
15
+
[IP-Adapter](https://hf.co/papers/2308.06721) is an image prompt adapter that can be plugged into diffusion models to enable image prompting without any changes to the underlying model. Furthermore, this adapter can be reused with other models finetuned from the same base model and it can be combined with other adapters like [ControlNet](../using-diffusers/controlnet). The key idea behind IP-Adapter is the *decoupled cross-attention* mechanism which adds a separate cross-attention layer just for image features instead of using the same cross-attention layer for both text and image features. This allows the model to learn more image-specific features.
16
16
17
17
> [!TIP]
18
-
> Learn how to load an IP-Adapter in the [Load adapters](../using-diffusers/loading_adapters#ip-adapter) guide.
18
+
> Learn how to load an IP-Adapter in the [Load adapters](../using-diffusers/loading_adapters#ip-adapter) guide, and make sure you check out the [IP-Adapter Plus](../using-diffusers/loading_adapters#ip-adapter-plus) section which requires manually loading the image encoder.
19
19
20
20
This guide will walk you through using IP-Adapter for various tasks and use cases.
21
21
22
22
## General tasks
23
23
24
-
Let's take a look at how to use IP-Adapter's image prompting capabilities with the [`StableDiffusionXLPipeline`] for tasks like text-to-image, image-to-image, and inpainting. Feel free to use another pipeline such as Stable Diffusion, LCM-LoRA, ControlNet, T2I-Adapter, or AnimateDiff!
24
+
Let's take a look at how to use IP-Adapter's image prompting capabilities with the [`StableDiffusionXLPipeline`] for tasks like text-to-image, image-to-image, and inpainting. We also encourage you to try out other pipelines such as Stable Diffusion, LCM-LoRA, ControlNet, T2I-Adapter, or AnimateDiff!
25
+
26
+
In all the following examples, you'll see the [`~IPAdapterMixin.set_ip_adapter_scale`] method. This method controls the amount of text or image conditioning to apply to the model. A value of `1.0` means the model is only conditioned on the image prompt. Lowering this value encourages the model to produce more diverse images, but they may not be as aligned with the image prompt. Typically, a value of `0.5` achieves a good balance between the two prompt types and produces good results.
25
27
26
28
<hfoptionsid="tasks">
27
29
<hfoptionid="Text-to-image">
@@ -30,8 +32,6 @@ Crafting the precise text prompt to generate the image you want can be difficult
30
32
31
33
Load a Stable Diffusion XL (SDXL) model and insert an IP-Adapter into the model with the [`~IPAdapterMixin.load_ip_adapter`] method. Use the `subfolder` parameter to load the weights for SDXL.
32
34
33
-
The [`~IPAdapterMixin.set_ip_adapter_scale`] method controls the amount of text or image conditioning to apply to the model. A value of `1.0` means the model is only conditioned on the image prompt. Lowering this value encourages the model to produce more diverse images, but they may not be as aligned with the image prompt. Typically, a value of `0.5` achieves a good balance between the two prompt types and produces good results.
34
-
35
35
```py
36
36
from diffusers import AutoPipelineForText2Image
37
37
from diffusers.utils import load_image
@@ -75,8 +75,6 @@ IP-Adapter can also help with image-to-image by guiding the model to generate an
75
75
76
76
Load a Stable Diffusion XL (SDXL) model and insert an IP-Adapter into the model with the [`~IPAdapterMixin.load_ip_adapter`] method. Use the `subfolder` parameter to load the weights for SDXL.
77
77
78
-
The [`~IPAdapterMixin.set_ip_adapter_scale`] method controls the amount of text or image conditioning to apply to the model. A value of `1.0` means the model is only conditioned on the image prompt. Lowering this value encourages the model to produce more diverse images, but they may not be as aligned with the image prompt. Typically, a value of `0.5` achieves a good balance between the two prompt types and produces good results.
79
-
80
78
```py
81
79
from diffusers import AutoPipelineForImage2Image
82
80
from diffusers.utils import load_image
@@ -126,8 +124,6 @@ IP-Adapter is also useful for inpainting because the image prompt allows you to
126
124
127
125
Load a Stable Diffusion XL (SDXL) model and insert an IP-Adapter into the model with the [`~IPAdapterMixin.load_ip_adapter`] method. Use the `subfolder` parameter to load the weights for SDXL.
128
126
129
-
The [`~IPAdapterMixin.set_ip_adapter_scale`] method controls the amount of text or image conditioning to apply to the model. A value of `1.0` means the model is only conditioned on the image prompt. Lowering this value encourages the model to produce more diverse images, but they may not be as aligned with the image prompt. Typically, a value of `0.5` achieves a good balance between the two prompt types and produces good results.
130
-
131
127
```py
132
128
from diffusers import AutoPipelineForInpainting
133
129
from diffusers.utils import load_image
@@ -177,8 +173,6 @@ images[0]
177
173
178
174
IP-Adapter can also help you generate videos that are more aligned with your text prompt. For example, let's load [AnimateDiff](../api/pipelines/animatediff) with it's motion adapter, and insert an IP-Adapter into the model with the [`~IPAdapterMixin.load_ip_adapter`] method.
179
175
180
-
The [`~IPAdapterMixin.set_ip_adapter_scale`] method controls the amount of text or image conditioning to apply to the model. A value of `1.0` means the model is only conditioned on the image prompt. Lowering this value encourages the model to produce more diverse images, but they may not be as aligned with the image prompt. Typically, a value of `0.5` achieves a good balance between the two prompt types and produces good results.
181
-
182
176
> [!WARNING]
183
177
> If you're planning on offloading the model to the CPU, make sure you run it after you've loaded the IP-Adapter. When you call [`~DiffusionPipeline.enable_model_cpu_offload`] before loading the IP-Adapter, it offloads the image encoder module to the CPU and it'll return an error when you try to run the pipeline.
184
178
@@ -289,7 +283,8 @@ image
289
283
290
284
More than one IP-Adapter can be used at the same time to generate specific images in more diverse styles. For example, you can use IP-Adapter FaceID to generate consistent faces and characters, and IP-Adapter Plus to generate those faces in a specific style. Let's try this out!
291
285
292
-
IP-Adapter uses an image encoder to generate the image features. The image encoder is automatically loaded if the image encoder model weights exist in an `image_encoder` subfolder in your repository. You don't need to explicitly load the image encoder if this subfolder exists. Otherwise, you'll need to load the image encoder weights into [`~transformers.CLIPVisionModelWithProjection`] and pass that to your pipeline. In this example, you'll manually load the image encoder just to see how it's done.
286
+
> [!TIP]
287
+
> Read the [IP-Adapter Plus](../using-diffusers/loading_adapters#ip-adapter-plus) section to learn why you need to manually load the image encoder.
Next, you'll load a base model, scheduler, and IP-Adapter's. The IP-Adapter's to use are passed as a list to the `weight_name` parameter:
308
303
309
-
* ip-adapter-plus_sdxl_vit-h uses patch embeddings and a ViT H image encoder
310
-
* ip-adapter-plus-face_sdxl_vit-h has the same architecture but it is conditioned with images of cropped faces
304
+
*[ip-adapter-plus_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) uses patch embeddings and a ViT H image encoder
305
+
*[ip-adapter-plus-face_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) has the same architecture but it is conditioned with images of cropped faces
IP-Adapter relies on an image encoder to generate image features. If the IP-Adapter repository contains a `image_encoder` subfolder, the image encoder is automatically loaded and registed to the pipeline. Otherwise, you'll need to explicitly load the image encoder with a [`~transformers.CLIPVisionModelWithProjection`] model and pass it to the pipeline.
338
-
339
-
```py
340
-
from diffusers import AutoPipelineForText2Image
341
-
from transformers import CLIPVisionModelWithProjection
IP-Adapter relies on an image encoder to generate image features. If the IP-Adapter repository contains a `image_encoder` subfolder, the image encoder is automatically loaded and registed to the pipeline. Otherwise, you'll need to explicitly load the image encoder with a [`~transformers.CLIPVisionModelWithProjection`] model and pass it to the pipeline. This is the case for *IP-Adapter Plus* checkpoints which use the ViT-H image encoder.
0 commit comments