fixes clip interpolate by nileshkokane01 · Pull Request #30783 · huggingface/transformers

nileshkokane01 · 2024-05-13T15:51:23Z

What does this PR do?

solves : #30579

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

amyeroberts

Thanks for working on this!

A few general comments:

interpolate_pos_encoding should be a boolean
It should be possible to call the model with this flag i.e. model(**inputs, interpolate_pos_encoding)
All the docstrings should be updated to include this argument
Print statements should be removed
Tests should make sure that the process image being passed to the model is not the default size

tests/models/x_clip/test_modeling_x_clip.py

tests/models/altclip/test_modeling_altclip.py

src/transformers/models/x_clip/modeling_x_clip.py

amyeroberts · 2024-05-15T15:11:19Z

src/transformers/models/altclip/modeling_altclip.py

        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
+        interpolate_pos_encoding: Optional[bool] = False,


The value should be True or False, but not None

Suggested change

interpolate_pos_encoding: Optional[bool] = False,

interpolate_pos_encoding: bool = False,

amyeroberts · 2024-05-15T15:40:36Z

tests/models/bridgetower/test_modeling_bridgetower.py

+        image_processor = BridgeTowerProcessor.from_pretrained(model_name)
+
+        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
+        inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)


amyeroberts · 2024-05-15T15:40:44Z

tests/models/chinese_clip/test_modeling_chinese_clip.py

+        image_processor = ChineseCLIPProcessor.from_pretrained(model_name)
+
+        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
+        inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)


amyeroberts · 2024-05-15T15:43:29Z

tests/models/clip/test_modeling_clip.py

+        # to visualize self-attention on higher resolution images.
+        model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device)
+
+        image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)


Three comments:

This is returning the processor, not the image_processor

size should be a dictionary e.g. size={"shortest_edge": 480} here

This won't test the interpolation, because the image processor crops after resizing. crop_size also has to be overriden

Suggested change

image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)

amyeroberts · 2024-05-15T15:44:53Z

tests/models/kosmos2/test_modeling_kosmos2.py

+        processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left")
+
+        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
+        inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)


Same here - parameters affecting output size have to be updated

amyeroberts · 2024-05-15T15:45:51Z

tests/models/x_clip/test_modeling_x_clip.py

+        # to visualize self-attention on higher resolution images.
+        model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device)
+
+        image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480)


Same here:

returns a processor

needs to override crop size

size and crop_size should be dicts

amyeroberts · 2024-06-07T17:44:38Z

@nileshkokane01 Any update on this PR?

nileshkokane01 · 2024-06-07T17:45:53Z

Will do this weekend, I'll let you know if I can't this week or the other.

github-actions · 2024-07-02T08:04:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyeroberts · 2024-07-02T21:29:48Z

@nileshkokane01 Any update on this?

manuelsh · 2024-07-08T16:16:53Z

@amyeroberts I can continue working on it.

manuelsh · 2024-07-10T20:53:09Z

Hi @amyeroberts. As I am not the owner of nileshkokane01:interpolate_clip repo, I've done the changes in my forked repo: see PR: #31900

manuelsh · 2024-07-20T10:38:09Z

Hi @amyeroberts, have you been able to have a look? I can ping someone else if needed. Thanks!

manuelsh · 2024-08-13T22:04:56Z

I have included the interpolation of positional embeddings in all the following models, (and their respective tests) in #32600 :

altclip
bridgetower
chineseclip
clip
clipseg
kosmos_2
x_clip
git

Waiting for review.

Thanks!

amyeroberts · 2024-08-15T18:29:50Z

@nileshkokane01 Shall we close this with the opening of #32600?

nileshkokane01 · 2024-08-16T00:55:34Z

@amyeroberts sure!

nileshkokane01 mentioned this pull request May 13, 2024

DeiT, CLIP and Git interpolation added #30649

Closed

5 tasks

fixes clip interpolate

6ed7b47

nileshkokane01 force-pushed the interpolate_clip branch from dd72bd1 to 6ed7b47 Compare May 13, 2024 15:56

amyeroberts reviewed May 15, 2024

View reviewed changes

amyeroberts mentioned this pull request May 21, 2024

Community contribution: enable dynamic resolution input for more vision models. #30579

Closed

11 tasks

manuelsh mentioned this pull request Jul 10, 2024

Interpolate clip #31900

Closed

manuelsh mentioned this pull request Aug 11, 2024

adding positional encoder changes and tests #32600

Merged

amyeroberts closed this Aug 16, 2024

	interpolate_pos_encoding: Optional[bool] = False,
	interpolate_pos_encoding: bool = False,

	image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)
	processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)

Conversation

nileshkokane01 commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 15, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts commented Jun 7, 2024

Uh oh!

nileshkokane01 commented Jun 7, 2024

Uh oh!

github-actions bot commented Jul 2, 2024

Uh oh!

amyeroberts commented Jul 2, 2024

Uh oh!

manuelsh commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manuelsh commented Jul 10, 2024

Uh oh!

manuelsh commented Jul 20, 2024

Uh oh!

manuelsh commented Aug 13, 2024

Uh oh!

amyeroberts commented Aug 15, 2024

Uh oh!

nileshkokane01 commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nileshkokane01 commented May 13, 2024 •

edited

Loading

manuelsh commented Jul 8, 2024 •

edited

Loading