Feat: Image Processor for CLIP #1

nkasmanoff · 2024-01-15T19:49:20Z

No description provided.

gboduljak · 2024-01-16T00:11:33Z

clip/image_config.py

@@ -0,0 +1,28 @@
+CLIP_IMAGE_CONFIG = {'openai/clip-vit-base-patch32': {


We want to pull these from HuggingFace instead of hardcoding constants. This solves the problem, but hurts maintenance.

gboduljak · 2024-01-16T00:12:31Z

clip/image_constants.py

@@ -0,0 +1,5 @@
+from typing import List, Union
+
+ImageInput = Union[


We can include this within the image_processor.

gboduljak · 2024-01-16T00:12:41Z

clip/image_processor.py

@@ -0,0 +1,252 @@
+from image_config import CLIP_IMAGE_CONFIG
+from jmage_constants import ImageInput


gboduljak

Thanks for the contribution. I will merge your change because my comments are nits. However, please ensure we test correctness as well. It is quite easy to mess this up. The errors in preprocessing can propagate further and can be difficult to debug. Here is a concrete test:

def test_image_processor():
    mx_image_proc = CLIPImageProcessor('openai/clip-vit-base-patch32')
    tf_image_proc = transformers.CLIPImageProcessor.from_pretrained(
        'openai/clip-vit-base-patch32')
    image = Image.open("cats.jpeg")

    data = mx_image_proc([image])
    mx_pixels = mx.array(data['pixel_values'])
    tf_pixels = mx.array(np.array(tf_image_proc([image])[
        'pixel_values']).transpose((0, 2, 3, 1)))

    assert mx.array_equal(mx_pixels, tf_pixels)

@nkasmanoff

@nkasmanoff: * clip image processor * added example usage

@nkasmanoff

* probably approximatelly correct CLIPTextEncoder * implemented CLIPEncoderLayer as built-in nn.TransformerEncoderLayer * replaced embedding layer with simple matrix * implemented ViT * added ViT tests * fixed tests * added pooler_output for text * implemented complete CLIPModel * implemented init * implemented convert.py and from_pretrained * fixed some minor bugs and added the README.md * removed tokenizer unused comments * removed unused deps * updated ACKNOWLEDGEMENTS.md * Feat: Image Processor for CLIP (#1) @nkasmanoff: * clip image processor * added example usage * refactored image preprocessing * deleted unused image_config.py * removed preprocessing port * added dependency to mlx-data * fixed attribution and moved photos to assets * implemented a simple port of CLIPImageProcessor * review changes * PR review changes * renamed too verbose arg * updated README.md * nits in readme / conversion * simplify some stuff, remove unneeded inits * remove more init stuff * more simplify * make test a unit test * update main readme * readme nits --------- Co-authored-by: Noah Kasmanoff <[email protected]> Co-authored-by: Awni Hannun <[email protected]>

nkasmanoff mentioned this pull request Jan 15, 2024

ViT + CLIP ml-explore/mlx-examples#143

Closed

gboduljak reviewed Jan 16, 2024

View reviewed changes

clip/image_processor.py

@@ -0,0 +1,252 @@

from image_config import CLIP_IMAGE_CONFIG

from jmage_constants import ImageInput

Copy link

Owner

gboduljak Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

gboduljak approved these changes Jan 16, 2024

View reviewed changes

gboduljak merged this pull request into gboduljak:clip Jan 16, 2024

awni pushed a commit that referenced this pull request Jan 31, 2024

Feat: Image Processor for CLIP (#1)

07d1628

@nkasmanoff: * clip image processor * added example usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Image Processor for CLIP #1

Feat: Image Processor for CLIP #1

Uh oh!

nkasmanoff commented Jan 15, 2024

Uh oh!

gboduljak Jan 16, 2024

Uh oh!

gboduljak Jan 16, 2024

Uh oh!

gboduljak Jan 16, 2024

Uh oh!

gboduljak left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,28 @@
		CLIP_IMAGE_CONFIG = {'openai/clip-vit-base-patch32': {

		@@ -0,0 +1,5 @@
		from typing import List, Union

		ImageInput = Union[

		@@ -0,0 +1,252 @@
		from image_config import CLIP_IMAGE_CONFIG
		from jmage_constants import ImageInput

Feat: Image Processor for CLIP #1

Feat: Image Processor for CLIP #1

Uh oh!

Conversation

nkasmanoff commented Jan 15, 2024

Uh oh!

gboduljak Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

gboduljak Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

gboduljak Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

gboduljak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants