From 37c55efd61279bba00a24d787707e6746264e79f Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Mon, 2 Oct 2023 18:54:38 +0900 Subject: [PATCH 1/7] docs: feat: model resources for CLIP --- docs/source/en/model_doc/clip.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index 8c1e11c398c1..2f70f2813cfb 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -86,6 +86,24 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h - A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd). - CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text). + + +- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing), a demo to using pretrained CLIP to inference captions with beam search. 🌎 + + + +- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6), a demo to generating image with prompting style tokens using pretrained CLIP and vqgan. 🌎 + +🚀 Deploy + +- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing), a guide on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 +- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb), a demo to image retrieval with how to show similarity score. 🌎 +- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing), a demo showing how to map images and texts to the same vector space using Multilingual CLIP. 🌎 + +⚡️ Inference + +- A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎 + If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it. The resource should ideally demonstrate something new instead of duplicating an existing resource. From ea379254be764dc69e03cc75395a2a113a9db190 Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Wed, 4 Oct 2023 08:49:11 +0900 Subject: [PATCH 2/7] fix: resolve suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/clip.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index 2f70f2813cfb..94f8d3ae8d78 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -88,17 +88,17 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h -- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing), a demo to using pretrained CLIP to inference captions with beam search. 🌎 +- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎 -- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6), a demo to generating image with prompting style tokens using pretrained CLIP and vqgan. 🌎 +- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6) on generating images with prompting style tokens with pretrained CLIP and VQGAN. 🌎 🚀 Deploy -- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing), a guide on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 -- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb), a demo to image retrieval with how to show similarity score. 🌎 -- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing), a demo showing how to map images and texts to the same vector space using Multilingual CLIP. 🌎 +- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 +- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎 +- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 ⚡️ Inference From 9ae5aa9f5aeb62bfaf90ed3fb066f426071e6156 Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Wed, 4 Oct 2023 09:32:25 +0900 Subject: [PATCH 3/7] fix: resolve suggestion --- docs/source/en/model_doc/clip.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index 94f8d3ae8d78..62d952bfadb4 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -90,17 +90,16 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h - A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎 - -- A [notebook](https://colab.research.google.com/drive/1py7au62Ky4ZdLtFJP0D4shNkIVym7JS6) on generating images with prompting style tokens with pretrained CLIP and VQGAN. 🌎 -🚀 Deploy +**Image retrieval** - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 +A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. -⚡️ Inference +**Explainability** - A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎 From 69fa70c0d25e5048bcbf83c1c0738e6c0dbbeced Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Wed, 4 Oct 2023 09:35:07 +0900 Subject: [PATCH 4/7] fix: resolve suggestion --- docs/source/en/model_doc/clip.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index 62d952bfadb4..eb112b0b27e6 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -97,7 +97,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 -A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. +A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.🌎 **Explainability** From 9de8639e4dcfdd13d94781167105163c20af2dc7 Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Fri, 6 Oct 2023 13:16:08 +0900 Subject: [PATCH 5/7] fix: resolve suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/clip.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index eb112b0b27e6..d23cbd248d95 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -88,7 +88,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h -- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search. 🌎 +- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎 From 7897b48203b8ea8822531e89a736c1d02a0d6af2 Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Sat, 7 Oct 2023 11:33:15 +0900 Subject: [PATCH 6/7] fix: resolve suggestion --- docs/source/en/model_doc/clip.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index d23cbd248d95..43a5a043b518 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -83,8 +83,8 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP. -- A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd). -- CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text). +- [Fine tuning CLIP with Remote Sensing (Satellite) images and captions](https://huggingface.co/blog/fine-tune-clip-rsicd), a blog post about how to fine-tune CLIP with [RSICD dataset](https://github.com/201528014227051/RSICD_optimal) and comparison of performance changes due to data augmentation. +- This [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) shows how to train a CLIP-like vision-text dual encoder model using a pre-trained vision and text encoder using [COCO dataset](https://cocodataset.org/#home). From 541633b972175039f18efbcdeb6ea834b3dbcf41 Mon Sep 17 00:00:00 2001 From: Injin Paek <71638597+eenzeenee@users.noreply.github.com> Date: Fri, 13 Oct 2023 22:25:25 +0900 Subject: [PATCH 7/7] fix: resolve suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/clip.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/source/en/model_doc/clip.md b/docs/source/en/model_doc/clip.md index 43a5a043b518..29b074f1cbbc 100644 --- a/docs/source/en/model_doc/clip.md +++ b/docs/source/en/model_doc/clip.md @@ -90,14 +90,12 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h - A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎 - - **Image retrieval** - A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎 - A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎 - A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎 -A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets.🌎 +- A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. 🌎 **Explainability**