Conversation
dd72bd1 to
6ed7b47
Compare
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for working on this!
A few general comments:
interpolate_pos_encodingshould be a boolean- It should be possible to call the model with this flag i.e.
model(**inputs, interpolate_pos_encoding) - All the docstrings should be updated to include this argument
- Print statements should be removed
- Tests should make sure that the process image being passed to the model is not the default size
| output_attentions: Optional[bool] = None, | ||
| output_hidden_states: Optional[bool] = None, | ||
| return_dict: Optional[bool] = None, | ||
| interpolate_pos_encoding: Optional[bool] = False, |
There was a problem hiding this comment.
The value should be True or False, but not None
| interpolate_pos_encoding: Optional[bool] = False, | |
| interpolate_pos_encoding: bool = False, |
| image_processor = BridgeTowerProcessor.from_pretrained(model_name) | ||
|
|
||
| image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
| inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
| image_processor = ChineseCLIPProcessor.from_pretrained(model_name) | ||
|
|
||
| image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
| inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
| # to visualize self-attention on higher resolution images. | ||
| model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device) | ||
|
|
||
| image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
There was a problem hiding this comment.
Three comments:
- This is returning the processor, not the image_processor
sizeshould be a dictionary e.g.size={"shortest_edge": 480}here- This won't test the interpolation, because the image processor crops after resizing.
crop_sizealso has to be overriden
| image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) | |
| processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
| processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left") | ||
|
|
||
| image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
| inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Same here - parameters affecting output size have to be updated
| # to visualize self-attention on higher resolution images. | ||
| model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device) | ||
|
|
||
| image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480) |
There was a problem hiding this comment.
Same here:
- returns a processor
- needs to override crop size
- size and crop_size should be dicts
|
@nileshkokane01 Any update on this PR? |
|
Will do this weekend, I'll let you know if I can't this week or the other. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@nileshkokane01 Any update on this? |
|
@amyeroberts I can continue working on it. |
|
Hi @amyeroberts. As I am not the owner of nileshkokane01:interpolate_clip repo, I've done the changes in my forked repo: see PR: #31900 |
|
Hi @amyeroberts, have you been able to have a look? I can ping someone else if needed. Thanks! |
|
I have included the interpolation of positional embeddings in all the following models, (and their respective tests) in #32600 :
Waiting for review. Thanks! |
|
@nileshkokane01 Shall we close this with the opening of #32600? |
|
@amyeroberts sure! |
What does this PR do?
solves : #30579
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.