Skip to content

🚨Default to fast image processors for all models#41388

Merged
ArthurZucker merged 81 commits into
huggingface:mainfrom
yonigozlan:default-fast-image-proc-all-models
Jan 21, 2026
Merged

🚨Default to fast image processors for all models#41388
ArthurZucker merged 81 commits into
huggingface:mainfrom
yonigozlan:default-fast-image-proc-all-models

Conversation

@yonigozlan

Copy link
Copy Markdown
Contributor

What does this PR do?

Following the trial testing with Qwen_VL image processors, this extends defaulting to fast image processors even for checkpoints saved with a slow one to all models.

Also made sure that all processors use AutoImageProcessor to instantiate their image_processor_class.
On that point, defining default subclass in processors feels a bit redundant, as we basically already have that in auto classes. It would be nice to get rid of this for v5, wdyt @molbap @zucchini-nlp @ArthurZucker ?
I'll open a PR for that too.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@molbap molbap left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good for v5! Let's see if we can even simplify further in this iteration

if common_kwargs:
for kwarg in output_kwargs.values():
kwarg.update(common_kwargs)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a good reason but I'm missing it, why is this moved up?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it is a fix from #41381 :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, which fixes #40931, got it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes mb, switch to a new branch without checking out main first 🥴

Comment on lines +42 to +49
def __init__(self, **kwargs):
super().__init__(**kwargs)
if not self.is_fast:
logger.warning_once(
f"Using a slow image processor (`{self.__class__.__name__}`). "
"As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` or the model-specific fast image processor class "
"to instantiate a fast image processor."
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Related, since we're touching on the topic of "loading old models from the hub with new utils" this is related to the "from_pretrained conversion" @Cyrilvallez is working on, if we have modifications to apply to some old image processors, they should be in the from_pretrained as well to "convert" the processor in the same sense.

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just wondering about some models where we had no lancsoz resampling. Do we get the closest resampling in those cases and are the diffs small enough?

class VideoLlavaProcessor(ProcessorMixin):
r"""
Constructs a VideoLlava processor which wraps a VideoLlava image processor and a Llava tokenizer into a single processor.
Constructs a VideoLlava processor which wraps a AutoImageProcessor and a Llava tokenizer into a single processor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imo we need not change the name when it is not referenced. Instead we only change the "[VideoLlavaImageProcessor] " one line below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, not very useful to have AutoImageProcessor as in the docstring. I'll change these back. I'm also working on getting auto_docstring to work on processors which should do all that automatically (check which subprocessors are in auto for this model) ;)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also working on getting auto_docstring to work on processors which should do all that automaticall

nice, very needed

if common_kwargs:
for kwarg in output_kwargs.values():
kwarg.update(common_kwargs)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it is a fix from #41381 :)

@yonigozlan

Copy link
Copy Markdown
Contributor Author

LGTM. Just wondering about some models where we had no lancsoz resampling. Do we get the closest resampling in those cases and are the diffs small enough?

Good point for the lanczos sampling, I might add an exception for these, as the diffs are not close enough imo

@ydshieh

ydshieh commented Dec 7, 2025

Copy link
Copy Markdown
Collaborator

I have pushed the updates. The following remaining failures needs some fix (not about expected output mismatching)

(if you want to read the log, you can go here and select ⚙️ icon and click View raw logs on the top left side)

This one

RUN_SLOW=1 python3 -m pytest -v tests/models/janus/test_modeling_janus.py::JanusIntegrationTest::test_model_generate_images - ValueError: Only returning PyTorch tensors is currently supported.

plus the following

{
    "clipseg": {
        "single-gpu": [
            {
                "test": "tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelIntegrationTest::test_inference_image_segmentation",
                "commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
                "status": "git bisect found the bad commit.",
                "pr_number": null,
                "author": "ydshieh",
                "merged_by": null,
                "parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
            }
        ]
    },
    "flava": {
        "single-gpu": [
            {
                "test": "tests/models/flava/test_modeling_flava.py::FlavaForPreTrainingIntegrationTest::test_inference",
                "commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
                "status": "git bisect found the bad commit.",
                "pr_number": null,
                "author": "ydshieh",
                "merged_by": null,
                "parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
            }
        ]
    },
    "gemma3": {
        "single-gpu": [
            {
                "test": "tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch_crops",
                "commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
                "status": "git bisect found the bad commit.",
                "pr_number": null,
                "author": "ydshieh",
                "merged_by": null,
                "parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
            },
            {
                "test": "tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_crops",
                "commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
                "status": "git bisect found the bad commit.",
                "pr_number": null,
                "author": "ydshieh",
                "merged_by": null,
                "parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
            }
        ]
    },
    "yolos": {
        "single-gpu": [
            {
                "test": "tests/models/yolos/test_modeling_yolos.py::YolosModelIntegrationTest::test_inference_object_detection_head",
                "commit": "07a50c395552a28582c2746e06318e8f2e1bf059",
                "status": "git bisect found the bad commit.",
                "pr_number": null,
                "author": "ydshieh",
                "merged_by": null,
                "parent": "377a8ee73f210476c4efb15170d0c32ad3b2c653"
            }
        ]
    }
}

@yonigozlan yonigozlan changed the title Default to fast image processors for all models 🚨Default to fast image processors for all models Dec 8, 2025
yonigozlan and others added 2 commits December 12, 2025 17:12
…ace#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests
@yonigozlan

Copy link
Copy Markdown
Contributor Author

Hey @ydshieh ! All the remaining tests should be fixed. I tried to merge with main, but there seems to be a lot of broken tests due to tokenization issues on main (the gemma3 and janus integration tests are broken at least)

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yonigozlan wdyt about first removing slow / fast concept ? otherwise it looks good to me but I think there were memory leak issues that we needed to fix before pushing this! once that's done happy to merge!

@yonigozlan

Copy link
Copy Markdown
Contributor Author

I think it will be easier to transition to a unified image processor backend (defaulting on torch/torchvision) if we first merge this PR.
Also do you have more info on memory leaks? Not sure what this alludes to

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM my main comment is about functions like extracting the device I think there should be a better way for us to track where the images are

self.assertEqual(
generated_text,
"The image depicts a man ironing clothes on the back of a yellow van in the middle of a busy city street. The man is wearing a yellow shirt with a yellow tie, and he is using an ironing board attached to the back of the van. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street. The man is using an ironing board attached to the back of a van in the middle of a busy city street.",
"The image depicts a man ironing clothes on the back of a yellow van in the middle of a busy city street. The man is wearing a yellow shirt with a yellow tie, and he is holding an ironing board in one hand and a laundry basket in the other. The image is unusual in that it shows a man ironing clothes on the back of a van in the middle of a busy city street.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a lot of changes!

if not self.is_fast:
logger.warning_once(
f"Using a slow image processor (`{self.__class__.__name__}`). "
"As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` "

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` "
"As we are transitioning to PyTorch-native processors, consider using `AutoImageProcessor` "

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's refrain this in prevision for the non fast/slow paradigmm

Comment on lines +44 to +47
if not self.is_fast:
logger.warning_once(
f"Using a slow image processor (`{self.__class__.__name__}`). "
"As we are transitioning to fast (PyTorch-native) processors, consider using `AutoImageProcessor` "

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sur we need to warn as by default it will be fast by default

yield i, item


def _get_device_from_images(images, is_nested: bool) -> "torch.device":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the processor is the one creating the torch tensor then I would suppose that there is a way to store the device in the data structure that it creates instead of having this function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly to avoid having to pass around a device argument to all group_image_by_shape calls, when it's easy to deduce it

raise ValueError("No images found in the batch.")


def get_device_from_images(images_list: list[list["torch.Tensor"]]) -> "torch.device":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need this when extracting the device should be exactly the same for every single image processor no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pass nested images to group_image_by_shape, and some have structures with empty lists, so this is needed for edge cases

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: altclip, aria, auto, aya_vision, chinese_clip, clip, clipseg, convnext, convnextv2, cvt, efficientloftr, fuyu, idefics2, idefics3, janus, lightglue

@github-actions

Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41388&sha=aca384

@ArthurZucker ArthurZucker merged commit 3fec8c2 into huggingface:main Jan 21, 2026
23 of 25 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* remove attributes and add all missing sub processors to their auto classes

* remove all mentions of .attributes

* cleanup

* fix processor tests

* fix modular

* remove last attributes

* fixup

* fixes after merge

* fix wrong tokenizer in auto florence2

* fix missing audio_processor + nits

* Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily)

* fix auto tokenizer test

* add init to markup_lm

* update CustomProcessor in custom_processing

* remove print

* nit

* refactor processor tests first part

* refactor part 2

* fix test modeling owlv2

* fix test_processing_layoutxlm

* Fix owlv2, wav2vec2, markuplm, voxtral issues

* part3

* refactor all processor with mixin

* add support for loading and saving multiple tokenizer natively

* remove exclude_attributes from save_pretrained

* get processor from pretrained instead of components in tests

* skip tests in colqwen2, pixtral

* modifs after review

* fix style and copies

* Fix after review

* add test_processor_from_pretrained_vs_from_components, fix failing tests

* fix overflowing_tokens tests

* add config for layoutxlm

* fix ci

* use modular

* fic docstring

* Fix most tests

* Standardize mgp_str tests

* fix oneformer processing tests + fix copies

* fix after review

* fix missing fet_images in fast image processors

* fix 01 - to check

* fix 02 - to check

* fix 03 - to check

* fix 03 - to check

* fix 03 - to check

* fix 04 - to check

* fix 05 - to check

* fix 06 - sytle

* fix 07 - revert

* Fix some errors

* Improve BatchFeature: stack list and lists of torch tensors (huggingface#42750)

* stack lists of tensors in BatchFeature, improve error messages, add tests

* remove unnecessary stack in fast image processors and video processors

* make style

* fix tests

* fix remaining tests

* fix copies

* Fix Lfm2_vl im proc test

* nit after review

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants