From 37c55efd61279bba00a24d787707e6746264e79f Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Mon, 2 Oct 2023 18:54:38 +0900
Subject: [PATCH 1/7] docs: feat: model resources for CLIP

---
 docs/source/en/model_doc/clip.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index 8c1e11c398c1..2f70f2813cfb 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -86,6 +86,24 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 - A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd).
 - CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text).
 
+<PipelineTag pipeline="image-to-text"/>
+
+- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing), a demo to using pretrained CLIP to inference captions with beam search. 🌎
+
+<PipelineTag pipeline="text-to-image"/>
+
+- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6), a demo to generating image with prompting style tokens using pretrained CLIP and vqgan. 🌎
+
+🚀 Deploy
+
+- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing), a guide on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
+- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb), a demo to image retrieval with how to show similarity score. 🌎
+- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing), a demo showing how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
+
+⚡️ Inference
+
+- A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎
+
 If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
 The resource should ideally demonstrate something new instead of duplicating an existing resource.
 

From ea379254be764dc69e03cc75395a2a113a9db190 Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Wed, 4 Oct 2023 08:49:11 +0900
Subject: [PATCH 2/7] fix: resolve suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/model_doc/clip.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index 2f70f2813cfb..94f8d3ae8d78 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -88,17 +88,17 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 
 <PipelineTag pipeline="image-to-text"/>
 
-- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing), a demo to using pretrained CLIP to inference captions with beam search. 🌎
+- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎
 
 <PipelineTag pipeline="text-to-image"/>
 
-- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6), a demo to generating image with prompting style tokens using pretrained CLIP and vqgan. 🌎
+- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6) on generating images with prompting style tokens with pretrained CLIP and VQGAN. 🌎
 
 🚀 Deploy
 
-- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing), a guide on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
-- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb), a demo to image retrieval with how to show similarity score. 🌎
-- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing), a demo showing how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
+- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
+- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
+- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
 
 ⚡️ Inference
 

From 9ae5aa9f5aeb62bfaf90ed3fb066f426071e6156 Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Wed, 4 Oct 2023 09:32:25 +0900
Subject: [PATCH 3/7] fix: resolve suggestion

---
 docs/source/en/model_doc/clip.md | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index 94f8d3ae8d78..62d952bfadb4 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -90,17 +90,16 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 
 - A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎
 
-<PipelineTag pipeline="text-to-image"/>
 
-- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6) on generating images with prompting style tokens with pretrained CLIP and VQGAN. 🌎
 
-🚀 Deploy
+**Image retrieval**
 
 - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
+A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.
 
-⚡️ Inference
+**Explainability**
 
 - A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎
 

From 69fa70c0d25e5048bcbf83c1c0738e6c0dbbeced Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Wed, 4 Oct 2023 09:35:07 +0900
Subject: [PATCH 4/7] fix: resolve suggestion

---
 docs/source/en/model_doc/clip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index 62d952bfadb4..eb112b0b27e6 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -97,7 +97,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
-A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.
+A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.🌎
 
 **Explainability**
 

From 9de8639e4dcfdd13d94781167105163c20af2dc7 Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Fri, 6 Oct 2023 13:16:08 +0900
Subject: [PATCH 5/7] fix: resolve suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/model_doc/clip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index eb112b0b27e6..d23cbd248d95 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -88,7 +88,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 
 <PipelineTag pipeline="image-to-text"/>
 
-- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎
+- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎
 
 
 

From 7897b48203b8ea8822531e89a736c1d02a0d6af2 Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Sat, 7 Oct 2023 11:33:15 +0900
Subject: [PATCH 6/7] fix: resolve suggestion

---
 docs/source/en/model_doc/clip.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index d23cbd248d95..43a5a043b518 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -83,8 +83,8 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o
 
 A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
 
-- A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd).
-- CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text).
+- [Fine tuning CLIP with Remote Sensing (Satellite) images and captions](https://huggingface.co/blog/fine-tune-clip-rsicd), a blog post about how to fine-tune CLIP with [RSICD dataset](https://github.com/201528014227051/RSICD_optimal) and comparison of performance changes due to data augmentation.
+- This [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) shows how to train a CLIP-like vision-text dual encoder model using a pre-trained vision and text encoder using [COCO dataset](https://cocodataset.org/#home).
 
 <PipelineTag pipeline="image-to-text"/>
 

From 541633b972175039f18efbcdeb6ea834b3dbcf41 Mon Sep 17 00:00:00 2001
From: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
Date: Fri, 13 Oct 2023 22:25:25 +0900
Subject: [PATCH 7/7] fix: resolve suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/model_doc/clip.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md
index 43a5a043b518..29b074f1cbbc 100644
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@@ -90,14 +90,12 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 
 - A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎
 
-
-
 **Image retrieval**
 
 - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 
-A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.🌎
+- A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. 🌎
 
 **Explainability**