adding positional encoder changes and tests#32600
adding positional encoder changes and tests#32600amyeroberts merged 54 commits intohuggingface:mainfrom
Conversation
|
@amyeroberts I have included the interpolation of positional embeddings in all the following models, and their respective tests:
Thanks! |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding - looks great!
Just a handful of small nits. Before merge, we'll need to run the slow tests for the models affected. Could you trigger this by running git commit --allow-empty -m "[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip"
| model(**inputs, interpolate_pos_encoding=False) | ||
| # forward pass | ||
| with torch.no_grad(): | ||
| outputs = model(**inputs, interpolate_pos_encoding=True) |
| @unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method") | ||
| def test_inputs_embeds_matches_input_ids_with_generate(self): | ||
| pass | ||
|
|
There was a problem hiding this comment.
Let's remove this as this logic is independent of this PR
| @unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method") | |
| def test_inputs_embeds_matches_input_ids_with_generate(self): | |
| pass |
There was a problem hiding this comment.
If I remove this, I will get the following error from the CI pipeline:
FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids_with_generate - ValueError: You passed `inputs_embeds` to `.generate()`, but the model class GitForCausalLM doesn't have its forwarding implemented. See the GPT2 implementation for an example (https://github.com/huggingface/transformers/pull/21405), and feel free to open a PR with it!
as shown here
There was a problem hiding this comment.
Could you rebase on main? I believe this has been resolved upstream
There was a problem hiding this comment.
This should still be removed as this tests is unrelated to this PR
| Whether to interpolate the pre-trained position encodings. | ||
| return_dict (`bool`, *optional*): | ||
| Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. | ||
|
|
| output_hidden_states (`bool`, *optional*): | ||
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | ||
| more detail. | ||
| interpolate_pos_encoding (`bool`, *optional*): |
There was a problem hiding this comment.
| interpolate_pos_encoding (`bool`, *optional*): | |
| interpolate_pos_encoding (`bool`, *optional*, defaults to `False`): |
| output_hidden_states (`bool`, *optional*): | ||
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | ||
| more detail. | ||
| interpolate_pos_encoding (`bool`, *optional*): |
There was a problem hiding this comment.
| interpolate_pos_encoding (`bool`, *optional*): | |
| interpolate_pos_encoding (`bool`, *optional*, defaults to `False`): |
|
OK, I've:
Please don't merge yet, just need some time to check and potentially fix the tests. |
|
GIT model test still requires to be fixed. Getting this error: due to Still on it. |
|
@manuelsh Have you included the most recent updates from |
|
@amyeroberts done, all |
amyeroberts
left a comment
There was a problem hiding this comment.
Beautiful - thanks for adding this capability to our models and for iterating on a solution!
|
@manuelsh Just the failing slow tests to address! |
|
@amyeroberts , I think it is not just substituting I believe I can make them work with
but now all my tests are crashing for different reasons (different tensors outputs for example) and this will take longer. Why not getting back to the previous working commit (d44e070), merge it, and then open another PR like #33226 but for the clip family models? I would be happy to contribute to it. |
|
@amyeroberts I was able to fix all tests with the new function, so no need to do an additional PR. Please review. |
|
@manuelsh OK, great. Just the merge conflict to resolve an a final slow run to confirm everything passes now. |
|
@amyeroberts I have resolved the conflicts, run the tests in slow, corrected one test in clipseg model not related to this PR that was not working |
|
@amyeroberts there were two tensors to correct in clipseg. Done now. If you can kindly approve the run for clipseg slow tests. |
|
@manuelsh Done and all passing - let's merge! Thanks for this addition and patience on iterating on this :) |
|
Fantastic! Glad to see it. Thank you! |
| batch_size, _, height, width = pixel_values.shape | ||
| if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size): | ||
| raise ValueError( | ||
| f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})." |
There was a problem hiding this comment.
Was this tested? I'm getting user reports of error Input image size (352*352) doesn't match model (224*224). regardless of what size of image is put in - I'm assuming somewhere in ClipSeg code it resizes to 352 - which makes sense considering docs seem to indicate that 352 is an expected value https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegForImageSegmentation.forward.example while indeed also showing that image_size is 224 https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegVisionConfig.image_size
ie: using ClipSeg exactly as documented will necessarily throw this error.
The original clipseg model built by huggingface staff defines image_size 224 in config https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/config.json#L127
and the preprocessor size as 352 https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/preprocessor_config.json#L20
So either this check is wrong, or official configs have been wrong for years, or there's meant to be some handling of the sizes between the two that's gone missing
EDIT: Posted issue #34415
There was a problem hiding this comment.
You need to add interpolate_pos_encoding=True in the signature when calling the model as discussed in the issue #34415.
* adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
@amyeroberts as there were some conflicts with merging with main on #31900 (possibly due to the
makescripts), I have reimplemented all the changes of #30783 in a new branch, which is rebased to main.