🚨Default to fast image processors for all models#41388
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
molbap
left a comment
There was a problem hiding this comment.
Sounds good for v5! Let's see if we can even simplify further in this iteration
| if common_kwargs: | ||
| for kwarg in output_kwargs.values(): | ||
| kwarg.update(common_kwargs) | ||
|
|
There was a problem hiding this comment.
I'm sure there's a good reason but I'm missing it, why is this moved up?
There was a problem hiding this comment.
Yes mb, switch to a new branch without checking out main first 🥴
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) | ||
| if not self.is_fast: | ||
| logger.warning_once( | ||
| f"Using a slow image processor (`{self.__class__.__name__}`). " | ||
| "As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` or the model-specific fast image processor class " | ||
| "to instantiate a fast image processor." | ||
| ) |
There was a problem hiding this comment.
SGTM!
Related, since we're touching on the topic of "loading old models from the hub with new utils" this is related to the "from_pretrained conversion" @Cyrilvallez is working on, if we have modifications to apply to some old image processors, they should be in the from_pretrained as well to "convert" the processor in the same sense.
zucchini-nlp
left a comment
There was a problem hiding this comment.
LGTM. Just wondering about some models where we had no lancsoz resampling. Do we get the closest resampling in those cases and are the diffs small enough?
| class VideoLlavaProcessor(ProcessorMixin): | ||
| r""" | ||
| Constructs a VideoLlava processor which wraps a VideoLlava image processor and a Llava tokenizer into a single processor. | ||
| Constructs a VideoLlava processor which wraps a AutoImageProcessor and a Llava tokenizer into a single processor. |
There was a problem hiding this comment.
nit: imo we need not change the name when it is not referenced. Instead we only change the "[VideoLlavaImageProcessor] " one line below
There was a problem hiding this comment.
Yes you're right, not very useful to have AutoImageProcessor as in the docstring. I'll change these back. I'm also working on getting auto_docstring to work on processors which should do all that automatically (check which subprocessors are in auto for this model) ;)
There was a problem hiding this comment.
I'm also working on getting auto_docstring to work on processors which should do all that automaticall
nice, very needed
| if common_kwargs: | ||
| for kwarg in output_kwargs.values(): | ||
| kwarg.update(common_kwargs) | ||
|
|
Good point for the lanczos sampling, I might add an exception for these, as the diffs are not close enough imo |
…m/yonigozlan/transformers into remove-attributes-from-processors
|
I have pushed the updates. The following remaining failures needs some fix (not about expected output mismatching) (if you want to read the log, you can go here and select ⚙️ icon and click This one
plus the following {
"clipseg": {
"single-gpu": [
{
"test": "tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelIntegrationTest::test_inference_image_segmentation",
"commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
"status": "git bisect found the bad commit.",
"pr_number": null,
"author": "ydshieh",
"merged_by": null,
"parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
}
]
},
"flava": {
"single-gpu": [
{
"test": "tests/models/flava/test_modeling_flava.py::FlavaForPreTrainingIntegrationTest::test_inference",
"commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
"status": "git bisect found the bad commit.",
"pr_number": null,
"author": "ydshieh",
"merged_by": null,
"parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
}
]
},
"gemma3": {
"single-gpu": [
{
"test": "tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch_crops",
"commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
"status": "git bisect found the bad commit.",
"pr_number": null,
"author": "ydshieh",
"merged_by": null,
"parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
},
{
"test": "tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_crops",
"commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
"status": "git bisect found the bad commit.",
"pr_number": null,
"author": "ydshieh",
"merged_by": null,
"parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
}
]
},
"yolos": {
"single-gpu": [
{
"test": "tests/models/yolos/test_modeling_yolos.py::YolosModelIntegrationTest::test_inference_object_detection_head",
"commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
"status": "git bisect found the bad commit.",
"pr_number": null,
"author": "ydshieh",
"merged_by": null,
"parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
}
]
}
} |
…om/yonigozlan/transformers into default-fast-image-proc-all-models
…ace#42750) * stack lists of tensors in BatchFeature, improve error messages, add tests * remove unnecessary stack in fast image processors and video processors * make style * fix tests
|
Hey @ydshieh ! All the remaining tests should be fixed. I tried to merge with main, but there seems to be a lot of broken tests due to tokenization issues on main (the gemma3 and janus integration tests are broken at least) |
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey @yonigozlan wdyt about first removing slow / fast concept ? otherwise it looks good to me but I think there were memory leak issues that we needed to fix before pushing this! once that's done happy to merge!
|
I think it will be easier to transition to a unified image processor backend (defaulting on torch/torchvision) if we first merge this PR. |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM my main comment is about functions like extracting the device I think there should be a better way for us to track where the images are
| self.assertEqual( | ||
| generated_text, | ||
| "The image depicts a man ironing clothes on the back of a yellow van in the middle of a busy city street. The man is wearing a yellow shirt with a yellow tie, and he is using an ironing board attached to the back of the van. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street.", | ||
| "The image depicts a man ironing clothes on the back of a yellow van in the middle of a busy city street. The man is wearing a yellow shirt with a yellow tie, and he is holding an ironing board in one hand and a laundry basket in the other. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street.", |
There was a problem hiding this comment.
that's a lot of changes!
| if not self.is_fast: | ||
| logger.warning_once( | ||
| f"Using a slow image processor (`{self.__class__.__name__}`). " | ||
| "As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` " |
There was a problem hiding this comment.
| "As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` " | |
| "As we are transitioning to PyTorch-native processors, consider using `AutoImageProcessor` " |
There was a problem hiding this comment.
let's refrain this in prevision for the non fast/slow paradigmm
| if not self.is_fast: | ||
| logger.warning_once( | ||
| f"Using a slow image processor (`{self.__class__.__name__}`). " | ||
| "As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` " |
There was a problem hiding this comment.
not sur we need to warn as by default it will be fast by default
| yield i, item | ||
|
|
||
|
|
||
| def _get_device_from_images(images, is_nested: bool) -> "torch.device": |
There was a problem hiding this comment.
If the processor is the one creating the torch tensor then I would suppose that there is a way to store the device in the data structure that it creates instead of having this function
There was a problem hiding this comment.
This is mainly to avoid having to pass around a device argument to all group_image_by_shape calls, when it's easy to deduce it
| raise ValueError("No images found in the batch.") | ||
|
|
||
|
|
||
| def get_device_from_images(images_list: list[list["torch.Tensor"]]) -> "torch.device": |
There was a problem hiding this comment.
not sure why we need this when extracting the device should be exactly the same for every single image processor no?
There was a problem hiding this comment.
Some pass nested images to group_image_by_shape, and some have structures with empty lists, so this is needed for edge cases
|
[For maintainers] Suggested jobs to run (before merge) run-slow: altclip, aria, auto, aya_vision, chinese_clip, clip, clipseg, convnext, convnextv2, cvt, efficientloftr, fuyu, idefics2, idefics3, janus, lightglue |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41388&sha=aca384 |
* remove attributes and add all missing sub processors to their auto classes * remove all mentions of .attributes * cleanup * fix processor tests * fix modular * remove last attributes * fixup * fixes after merge * fix wrong tokenizer in auto florence2 * fix missing audio_processor + nits * Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily) * fix auto tokenizer test * add init to markup_lm * update CustomProcessor in custom_processing * remove print * nit * refactor processor tests first part * refactor part 2 * fix test modeling owlv2 * fix test_processing_layoutxlm * Fix owlv2, wav2vec2, markuplm, voxtral issues * part3 * refactor all processor with mixin * add support for loading and saving multiple tokenizer natively * remove exclude_attributes from save_pretrained * get processor from pretrained instead of components in tests * skip tests in colqwen2, pixtral * modifs after review * fix style and copies * Fix after review * add test_processor_from_pretrained_vs_from_components, fix failing tests * fix overflowing_tokens tests * add config for layoutxlm * fix ci * use modular * fic docstring * Fix most tests * Standardize mgp_str tests * fix oneformer processing tests + fix copies * fix after review * fix missing fet_images in fast image processors * fix 01 - to check * fix 02 - to check * fix 03 - to check * fix 03 - to check * fix 03 - to check * fix 04 - to check * fix 05 - to check * fix 06 - sytle * fix 07 - revert * Fix some errors * Improve BatchFeature: stack list and lists of torch tensors (huggingface#42750) * stack lists of tensors in BatchFeature, improve error messages, add tests * remove unnecessary stack in fast image processors and video processors * make style * fix tests * fix remaining tests * fix copies * Fix Lfm2_vl im proc test * nit after review --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
Following the trial testing with Qwen_VL image processors, this extends defaulting to fast image processors even for checkpoints saved with a slow one to all models.
Also made sure that all processors use AutoImageProcessor to instantiate their image_processor_class.
On that point, defining default subclass in processors feels a bit redundant, as we basically already have that in auto classes. It would be nice to get rid of this for v5, wdyt @molbap @zucchini-nlp @ArthurZucker ?
I'll open a PR for that too.