Uniformize kwargs for Idefics/2 processors by yonigozlan · Pull Request #32568 · huggingface/transformers

yonigozlan · 2024-08-09T16:10:30Z

What does this PR do?

Adds uniformized processors kwargs following #31911 for the following models:

Idefics
Idefics2

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@molbap @zucchini-nlp @amyeroberts

HuggingFaceDocBuilderDev · 2024-08-09T16:29:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yonigozlan · 2024-08-09T16:50:32Z

src/transformers/models/idefics/processing_idefics.py

Lots of logic is needed for backward compatibility, as idefics used to take only prompts where text and images inputs would be interleaved. This added logic preserve supports for these kind of inputs (where prompts is replaced by text arg), while adding support for usual text and images inputs as in other image-text-to-text models. This will also be useful to support idefics in the image-text-to-text pipeline.

zucchini-nlp

Also looks good to me. Just want to clarify what will be the new format for Idefics to make the pipeline happy. Maybe we can add a test for that new format :)

zucchini-nlp · 2024-08-12T05:41:27Z

src/transformers/models/idefics/processing_idefics.py

I guess this code block is for new processing behavior when users pass images and text.

Not very sure this is a good idea to repeat text several times. Suppose user has one prompt with interleaved images-text, then we would replicate the prompt several times and cause error in downstream modeling code. For ex:

processor(text=["User: What do you see here? Assistant: a cat. User: what about this image?"], images=[image1, image2])

Yes that's a good point. Although interleaved images-text is not really supported when providing both images and text for Idefics, as there is no way to indicate where to put the images in the prompt. Maybe I should add a warning here instead of automatically duplicating the prompts?

Ah I see now, indeed Idefics is a bit peculiar.

Yes interleaving like that is not, but providing more than 1 image per prompt like in multi-turn conversation is okey, as in the dosctring of call method. Then we should expect users to pass as many images as prompts, and they would have to wrap images as a batched list if there's more than one per prompt.

I think we can even raise an error, as we cannot know for sure what is the user expecting with these inputs. An error explaining what kind of input we want and let the user fix it, otherwise users who never read warnings might start complaining in the issues :)

Added support for multiple images per prompt, and this warning to make it clearer what input format we expect when using image-text-to-text format:
https://github.com/huggingface/transformers/blob/8b171a777bac10bbb9c9a13bd36d6ffd10be9b9d/src/transformers/models/idefics/processing_idefics.py#L353-L358

src/transformers/models/idefics/processing_idefics.py

tests/models/idefics/test_processor_idefics.py

src/transformers/models/idefics/processing_idefics.py

yonigozlan · 2024-09-24T01:21:57Z

This should now be ready for review! Cc @molbap @amyeroberts .
It also brings some changes to test_processing_common that could benefit other models.
I think with @molbap PR #31368 and the following pending PRs #32544 #33668 #32181 merged, that would be all the image-text-to-text processors uniformized!

amyeroberts

Thanks for working on this processor!

Let's split up the changes in the processor tests from the changing of the processor.

tests/models/idefics/test_processor_idefics.py

tests/models/idefics2/test_processing_idefics2.py

src/transformers/models/idefics2/processing_idefics2.py

tests/models/idefics2/test_processing_idefics2.py

src/transformers/models/idefics/processing_idefics.py

HuggingFaceDocBuilderDev · 2024-10-01T21:58:23Z

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

Add the run-slow label to the PR
When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
- If the pull request affects a lot of models, put at most 10 models in the commit message
A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

yonigozlan · 2024-10-01T22:32:00Z

Hey @ArthurZucker !
This should be ready for a final review.
A little overview of the changes:

Idefics processor needed quite a lot of added logic to work with the new standardized processor signature, as it used to take prompts inputs only and not separate between images and text, but full BC should be supported still.
Idefics2 processor uniformization is quite straightforward, and is now very close to Idefics3 processor.
Idefics processor tests needed a bit of overriding as it is very different from other vlms processor in Transformers, and notably doesn't take in do_rescale or scale_factor as args for its call function, which we use in common tests.
Idefics2 processor tests needed only overriding the mock inputs, as the text prompts need to include the token for each corresponding input images, and the processor needs nested images when working with batched inputs.
Thanks you!

ArthurZucker

LGTM appart from mentioned breaking change for idefics

ArthurZucker · 2024-10-03T14:26:19Z

src/transformers/models/idefics/processing_idefics.py

we have a breaking change here no? prompts will be passed by previous workflows (ex: prompts=xxx), and we don't check if prompt in kwargs

I think this should be handled by https://github.com/huggingface/transformers/blob/7a63c6f7c11ecdb423bedb5558b2cbe32c43ed37/src/transformers/models/idefics/processing_idefics.py#L236
with prompts being replaced by text automatically

Ah right! missed this indeed, good to go then!

… idefics

…ess uniformization

* Add uniformize idefics processor kwargs and tests * Uniformize idefics2 processor kwargs * add image_processor tests idefics * add BC args order change idefics2 processor and update doc * Add support for multiple images per prompt in image-text-to-text mode idefics * Fix processor input args in idefics tests * improve test processing common, remove unnecessary tests, update process uniformization * fix doctrings idefics * fix tests processors idefics/2

yonigozlan mentioned this pull request Aug 9, 2024

Uniform kwargs for processors #31911

Closed

40 tasks

yonigozlan marked this pull request as ready for review August 9, 2024 16:14

yonigozlan requested review from amyeroberts, molbap and zucchini-nlp August 9, 2024 16:14

yonigozlan commented Aug 9, 2024

View reviewed changes

yonigozlan mentioned this pull request Aug 10, 2024

Add Idefics 3! #32473

Merged

5 tasks

zucchini-nlp reviewed Aug 12, 2024

View reviewed changes

andimarafioti reviewed Aug 13, 2024

View reviewed changes

src/transformers/models/idefics/processing_idefics.py Outdated Show resolved Hide resolved

yonigozlan mentioned this pull request Aug 14, 2024

Standardize image-text-to-text-models outputs #32471

Closed

26 tasks

yonigozlan force-pushed the uniformize-processors-kwargs-idefics-idefics2 branch from 9799303 to 8b171a7 Compare August 14, 2024 14:13

yonigozlan force-pushed the uniformize-processors-kwargs-idefics-idefics2 branch from 747fbe1 to 6da6fa2 Compare September 24, 2024 01:16

amyeroberts reviewed Sep 25, 2024

View reviewed changes

yonigozlan force-pushed the uniformize-processors-kwargs-idefics-idefics2 branch from 6da6fa2 to 6a62786 Compare October 1, 2024 21:58

yonigozlan requested a review from ArthurZucker October 1, 2024 22:19

ArthurZucker reviewed Oct 3, 2024

View reviewed changes

yonigozlan added 9 commits October 3, 2024 14:58

Add uniformize idefics processor kwargs and tests

2aafd14

Uniformize idefics2 processor kwargs

5776088

add image_processor tests idefics

1a063b5

add BC args order change idefics2 processor and update doc

5c6dbf2

Add support for multiple images per prompt in image-text-to-text mode…

8f46b5c

… idefics

Fix processor input args in idefics tests

037550d

improve test processing common, remove unnecessary tests, update proc…

49fe60a

…ess uniformization

fix doctrings idefics

6e97b9a

fix tests processors idefics/2

0065697

yonigozlan force-pushed the uniformize-processors-kwargs-idefics-idefics2 branch from 7a63c6f to 0065697 Compare October 3, 2024 14:59

ArthurZucker approved these changes Oct 3, 2024

View reviewed changes

yonigozlan merged commit 074aa3b into huggingface:main Oct 3, 2024

Conversation

yonigozlan commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yonigozlan commented Sep 24, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2024

Uh oh!

yonigozlan commented Oct 1, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yonigozlan commented Aug 9, 2024 •

edited

Loading