🚨🚨 Refactor Image Processors to support different backends by yonigozlan · Pull Request #43514 · huggingface/transformers

yonigozlan · 2026-01-27T03:32:12Z

Image Processor Backend Refactor

Summary

Replaces the dual-file BaseImageProcessor (slow/PIL) + BaseImageProcessorFast (fast/torchvision) design with a unified backend architecture. The image_processing_utils_fast module is removed; all logic lives in image_processing_utils and image_processing_backends.

New Structure

Base classes: BaseImageProcessor in image_processing_utils defines the shared preprocessing pipeline (kwargs validation, input preparation, dispatching to backends). The built-in backend classes live in a separate file, image_processing_backends.py:

TorchvisionBackend: GPU-accelerated, batched operations on torch.Tensor, channels-first
PilBackend: Portable CPU-only, operations on np.ndarray, channels-first

Each backend implements process_image (convert raw input to backend format) and _preprocess (batch operations). Model-specific processors inherit from one of these backends.

File layout: Per model: image_processing_<model>.py for the torchvision backend (default), image_processing_pil_<model>.py for the PIL backend when both exist. The no-suffix class is now torchvision (opposite of the old *Fast convention).

Shared pipeline: Both backends use the same preprocess flow: validate kwargs → standardize (size, crop_size, pad_size, resample) → prepare inputs via process_image → run _preprocess. Torchvision batches by shape for efficiency; PIL processes images one by one.

Loading Paths & Fallback Logic

AutoImageProcessor.from_pretrained: Config resolution order: image processor config → nested processor config → model config. Class resolution uses image_processor_type or auto_map["AutoImageProcessor"], with fallback from legacy feature_extractor_type / AutoFeatureExtractor.

Backend resolution: New backend parameter replaces use_fast. Resolution order: (1) deprecated use_fast → converted to backend with warning; (2) explicit backend → used as-is; (3) default: "pil" for Lanczos models (Chameleon, Flava, Idefics3, SmolVLM); otherwise "torchvision" if available, else "pil".

Mapping format: IMAGE_PROCESSOR_MAPPING_NAMES entries are now {"torchvision": "ClassName", "pil": "ClassNamePil"} dicts instead of (slow, fast) tuples. Models may expose one or both backends.

Fallback when backend unavailable: _load_class_with_fallback tries the requested backend first, then other backends in the mapping. If torchvision is requested but unavailable, falls back to PIL with a warning.

Registering New / Custom Backends

AutoImageProcessor.register(): Registers image processor classes for a given config. The preferred API is image_processor_classes={"backend_name": ProcessorClass}. You can register one or more backends per model type.

Custom backends: The backend key space is open: any string (e.g. "torchvision", "pil", "mlx", "onnx" etc.) can be used. Each processor class must inherit from BaseImageProcessor and implement process_image and _preprocess. Users select a backend via AutoImageProcessor.from_pretrained(..., backend="custom"). The same fallback logic applies: if the requested backend is unavailable (e.g. missing deps), loading tries other backends in the mapping.

Legacy params: slow_image_processor_class and fast_image_processor_class are deprecated; they are converted to image_processor_classes={"pil": ...} and image_processor_classes={"torchvision": ...} respectively.

Partial updates: When re-registering a config that already has backends, passing image_processor_classes merges into the existing mapping (e.g. adding a new backend without overwriting existing ones).

Backward Compatibility

use_fast=True/False: Deprecated warning; converted to backend="torchvision" / backend="pil".
image_processor_type: "FooImageProcessorFast" in config: Strips Fast suffix; resolves to base class and requested backend.
BaseImageProcessorFast class name: Resolves to TorchvisionBackend.
FooImageProcessorFast via import: _LazyModule / get_image_processor_class_from_name resolves to FooImageProcessor when Fast class no longer exists.
from transformers import FooImageProcessor when torchvision missing: _LazyModule.__getattr__ transparently falls back to FooImageProcessorPil and warns once (import_utils).
auto_map: [slow, fast] list: _resolve_auto_map_class_ref supports both list and new dict format.
slow_image_processor_class / fast_image_processor_class in register(): Converted to new image_processor_classes={} dict form.
is_fast property: Deprecated; use processor.backend == "torchvision".

Other Changes

resample: Single parameter name; Torchvision backend maps PIL resample to InterpolationMode internally.
SizeDict: Used consistently in _preprocess; dict literals remain for class attribute defaults.
_set_attributes: Centralized in BaseImageProcessor; backends call it in __init__ to resolve kwargs and class defaults.
import_utils.BASE_FILE_REQUIREMENTS: Still treats image_processing*_fast.py as torchvision-backed for lazy import structure; legacy _fast filenames may remain until models are fully migrated.

…kends

ArthurZucker

this on a great direction!
I think that having the image_processor_xxxx in general calling self.resize works well as it would fetch the backend's method.

The thing I am not seeing right now is for example how would someone go about adding a new ImageProcessingLlavaNext but with say mlx processing.

He has to create a class that would inherit from his custom mixin, then there needs to be a way for him automatically make sure that MlxImageProcessingLlavaNext is the class that is gonne be used when requesting mlx-backend.

If we are able to take that into account we should be fairly ready!
Otherwise very nice for now!

ArthurZucker · 2026-02-02T13:36:34Z

+        `bool`: Whether or not this image processor is using the fast (TorchVision) backend.
        """
-        return False
+        return self.backend == "torchvision"


this attribute should be removed imo. Numpy can be faster in some cases and it does not represent anything anymore

added a deprecation cycle as I think it's used by downstream libraries

ArthurZucker · 2026-02-02T13:37:39Z

+    # Backend availability checkers: maps backend names to functions that check availability
+    _backend_availability_checks = {
+        "torchvision": is_torchvision_available,
+        "python": lambda: True,  # Python backend is always available


Suggested change

"python": lambda: True, # Python backend is always available

"numpy": lambda: True, # Python backend is always available

It relies on numpy no? (just saying the name should probably be different)

Yes it's a bit misleading, but the vision operation are handled by PiL (with numpy arrays as inputs/outputs), so maybe naming the backend as "pil" is better? Plus it makes the fact that PiL is a required dependency to use this backend more explicit.

yonigozlan · 2026-02-02T20:47:06Z

Thanks @ArthurZucker !
This should already be supported for transformers contributors, and I've added a register_backend() method to make this cleaner for users who don't want to modify the transformers codebase:

Contributing to transformers:

# In image_processing_utils.py - create the generic MLX backend if it doesn't already exists
class MlxBackend(ImageProcessingBackend):
    def resize(self, image, size, **kwargs):
        # generic MLX resize
        pass
    # ... other generic MLX methods

# In llava_next/image_processing_llava_next.py - inherit from it
class LlavaNextMlxBackend(MlxBackend):
    def preprocess(self, images, image_grid_pinpoints, **kwargs):
        # LlavaNext-specific patch processing with MLX
        pass

class LlavaNextImageProcessor(BaseImageProcessor):
    _backend_classes = {
        "torchvision": LlavaNextTorchVisionBackend,
        "python": LlavaNextPythonBackend,
        "mlx": LlavaNextMlxBackend,
    }
    _backend_availability_checks = {
        "torchvision": is_torchvision_available,
        "python": lambda: True,
        "mlx": is_mlx_available,
    }

Without changing transformers codebase:

from transformers import ImageProcessingBackend, LlavaNextImageProcessor

# No need for users to add both an MLX mixin and an inherited LlavaNextMlxBackend, just overwrite the necessary method directly in LlavaNextMlxBackend
class LlavaNextMlxBackend(ImageProcessingBackend):
    def resize(self, image, size, **kwargs):
        # your MLX implementation
        pass
    # ... implement other methods

LlavaNextImageProcessor.register_backend(
    name="mlx",
    backend_class=LlavaNextMlxBackend,
    availability_check=lambda: is_mlx_available()  # optional
)

processor = LlavaNextImageProcessor(backend="mlx")

Then instantiate like this:

processor = LlavaNextImageProcessor.from_pretrained("llava-hf/llama3-llava-next-8b-hf", backend="mlx")

ArthurZucker · 2026-02-10T15:09:52Z


-@requires(backends=("vision",))
+@lru_cache(maxsize=10)
+def validate_fast_preprocess_arguments(


what is the fast sense here?

None, needs to be renamed/modified 😁

ArthurZucker · 2026-02-10T15:16:12Z

+                "pil": MyPilBackend,
+            }
+
+    To add a new backend, extend both `_backend_classes` and `_backend_availability_checks`:


let's rather push for register?

ArthurZucker · 2026-02-10T15:19:33Z

+    resample = None
+    image_mean = None
+    image_std = None
+    size = None
+    default_to_square = True
+    crop_size = None
+    do_resize = None
+    do_center_crop = None
+    do_pad = None
+    pad_size = None
+    do_rescale = None
+    rescale_factor = 1 / 255
+    do_normalize = None
+    do_convert_rgb = None
+    return_tensors = None
+    data_format = ChannelDimension.FIRST
+    input_data_format = None
+    device = None
+    model_input_names = ["pixel_values"]
+    image_seq_length = None


i really don't understand why you have these when yuou also have ImageKwargs? does it not defeat the point ?

ArthurZucker · 2026-02-10T15:20:14Z

+        Update kwargs that need further processing before being validated.
+        Can be overridden by subclasses to customize the processing of kwargs.
+        """


this function looks very weird.... but okay

ArthurZucker · 2026-02-10T15:20:48Z

+        # Extract parameters that are only used for preparing the input images
+        do_convert_rgb = kwargs.pop("do_convert_rgb")
+        input_data_format = kwargs.pop("input_data_format")
+        device = kwargs.pop("device")


this is weird as well IDG why they can't fall through the rest normally

ArthurZucker · 2026-02-10T15:21:26Z

+        """
+        Preprocess an image or a batch of images.
+        """
+        validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self._valid_kwargs_names)


why do we have so many validation steps? validate-kwargs, which are type dicts, then validate typedict, then set default, then further process, then validate process.
It "looks" mega bloated

…kends

…ting from BaseImageProcessor

ArthurZucker

much better / simpler imo!

ArthurZucker · 2026-02-17T13:07:18Z

+        return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors)
+
+
+class PilBackend(BaseImageProcessor):


not a super strong opinion but I would probably split in different files!

ArthurZucker · 2026-02-17T13:08:54Z

+    For processors that only need standard operations (resize, center crop, rescale, normalize), define class
+    attributes:
+
+        class MyImageProcessor(BaseImageProcessor):


Suggested change

class MyImageProcessor(BaseImageProcessor):

class MyImageProcessor(PilBackend):

IDK I might be wrong!

Yep sorry the docstrings were out of date!

ArthurZucker · 2026-02-17T13:09:29Z

+        class MyImageProcessor(BaseImageProcessor):
+            _backend_classes = {
+                "torchvision": MyTorchVisionBackend,
+                "pil": MyPilBackend,
+            }


this is not valid anymore but you probably did not have the time to update it

updated now ;)

ArthurZucker · 2026-02-17T13:10:25Z

+        validate_typed_dict(self.valid_kwargs, kwargs)
+
+        # Set default kwargs from self
+        for kwarg_name in self._valid_kwargs_names:
+            kwargs.setdefault(kwarg_name, getattr(self, kwarg_name, None))
+
+        # Update kwargs that need further processing before being validated
+        kwargs = self._standardize_kwargs(**kwargs)
+
+        # Validate kwargs
+        print("kwargs: ", kwargs)
+        self._validate_preprocess_kwargs(**kwargs)
+
+        return self._preprocess_image_like_inputs(images, *args, **kwargs)


still same comment, but its fine to adress later / it looks a bit more simple !

ArthurZucker · 2026-02-17T13:13:25Z

+            if isinstance(image_processor_mapping, (list, tuple)):
+                pil_class, torchvision_class = image_processor_mapping
+                image_processor_mapping = {"pil": pil_class, "torchvision": torchvision_class}


not 100% sure when would that happen?

maybe if we update register to support tuple (code that would already be there) then we won't need this?

here I don't get it, type(config) exits, why do we create image_processor_mapping ? when it should already be correct?

ArthurZucker · 2026-02-17T13:14:50Z

-        do_reduce_labels: bool = False,
-        **kwargs,
-    ) -> None:
+    resample = PILImageResampling.BICUBIC


I am seeing TorchVisionBackendbut then PILImageResampling with PIL, weird but I guess its just an enum

@yonigozlan

…kends

github-actions · 2026-02-27T19:49:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, beit, bit, blip, bridgetower, chameleon, chinese_clip

…image processors (directly in modular model converter)

…kends

The elif branch for URL detection (is_remote_url + download_url) was accidentally removed in huggingface#43514 during the image processor refactor. This restores URL support with a local download_url helper using httpx, since the old utils.hub.download_url was intentionally dropped in v5. Fixes huggingface#44821

* fix * check * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…age processor backend refactor The PR #43514 refactored _preprocess to pass resample=resample to resize, but resize still accepted interpolation as its parameter. The resample kwarg was silently swallowed by **kwargs, causing interpolation to default to BILINEAR instead of the intended LANCZOS->BICUBIC path, producing ~0.36 difference in pixel_values. Fix by renaming the parameter to resample and converting PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize.

…r backend refactor (#45258) * Fix SmolVLM video processor resize using wrong interpolation after image processor backend refactor The PR #43514 refactored _preprocess to pass resample=resample to resize, but resize still accepted interpolation as its parameter. The resample kwarg was silently swallowed by **kwargs, causing interpolation to default to BILINEAR instead of the intended LANCZOS->BICUBIC path, producing ~0.36 difference in pixel_values. Fix by renaming the parameter to resample and converting PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize. * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…r backend refactor (huggingface#45258) * Fix SmolVLM video processor resize using wrong interpolation after image processor backend refactor The PR huggingface#43514 refactored _preprocess to pass resample=resample to resize, but resize still accepted interpolation as its parameter. The resample kwarg was silently swallowed by **kwargs, causing interpolation to default to BILINEAR instead of the intended LANCZOS->BICUBIC path, producing ~0.36 difference in pixel_values. Fix by renaming the parameter to resample and converting PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize. * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

yonigozlan added 3 commits January 27, 2026 03:31

init refactor

b64d760

Fix llava

43ce3c9

Merge remote-tracking branch 'upstream/main' into refactor-improc-bac…

75a5f8b

…kends

ArthurZucker reviewed Feb 2, 2026

View reviewed changes

yonigozlan added 2 commits February 2, 2026 21:32

changes after review

94e2f2c

update first batch of image processors

34aa802

ArthurZucker reviewed Feb 10, 2026

View reviewed changes

refactor part 2

21ff306

yonigozlan mentioned this pull request Feb 11, 2026

Fix torch only support for fast Processors #42824

Open

5 tasks

yonigozlan added 3 commits February 11, 2026 22:28

improve base image processor class, move backends to separate file

de23b46

Merge remote-tracking branch 'upstream/main' into refactor-improc-bac…

1d64b21

…kends

refactor to have backends in separate files, with backends now inheri…

56c49c8

…ting from BaseImageProcessor

ArthurZucker reviewed Feb 17, 2026

View reviewed changes

yonigozlan added 15 commits February 17, 2026 15:05

fix docstrings

ad91d8e

update some image processors to new refactored standards

87dc38b

refactor more image processors

780c0cb

refactor more image processors

526a668

refactor more fast image processors

85207f2

refactor more image processors

e1c56dc

refactor more image processor

be2d888

improve compatibility with video processors

ed57337

refactor more image processors

fbf6be5

add more image processors, improve compatibility with video processors

6efda3a

support for modular

97f0dd9

refactor modular ima proc

1da0c11

refactor more modular image processors

f27bce9

adjustments before merge

5135b2b

Merge remote-tracking branch 'upstream/main' into refactor-improc-bac…

5641c8e

…kends

yonigozlan and others added 9 commits March 18, 2026 16:26

Fix after review, enforce protected torch/torchvision imports in pil …

3b9b589

…image processors (directly in modular model converter)

Fix style

b12a20d

Fix test modeling depth pro

9fc7900

Fix processing_idefics

01fb2f1

Merge branch 'main' into refactor-improc-backends

fcd2335

Merge remote-tracking branch 'upstream/main' into refactor-improc-bac…

fd54fd7

…kends

Fixes after merge

a094cef

_rescale_and_normalize -> rescale_and_normalize

33dfa8c

fix-repo

8c13796

yonigozlan enabled auto-merge March 19, 2026 14:14

Merge branch 'main' into refactor-improc-backends

7b8d547

yonigozlan added this pull request to the merge queue Mar 19, 2026

Merged via the queue into huggingface:main with commit 8843333 Mar 19, 2026
28 checks passed

yonigozlan deleted the refactor-improc-backends branch March 19, 2026 14:47

vasqu mentioned this pull request Mar 19, 2026

[Model] Add PP-Chart2Table Model Support #43767

Merged

5 tasks

he-yufeng mentioned this pull request Mar 20, 2026

Unable to load AutoImageProcessor from URL #44821

Closed

he-yufeng mentioned this pull request Mar 20, 2026

Fix AutoImageProcessor.from_pretrained failing on URL input #44892

Closed

ydshieh mentioned this pull request Mar 23, 2026

Fix failing job Update Transformers metadata after #43514 #44941

Merged

ydshieh added a commit that referenced this pull request Mar 23, 2026

Fix failing job Update Transformers metadata after #43514 (#44941)

0f19dec

* fix * check * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

This was referenced Mar 26, 2026

Add BC for _further_process_kwargs #45033

Merged

[Transformers v5] fix missing pixtral/voxtral multimodal dispatch vllm-project/vllm#38410

Merged

ydshieh mentioned this pull request Apr 5, 2026

Fix more integration tests for important models #45254

Open

ydshieh mentioned this pull request Apr 6, 2026

Fix SmolVLM video processor resize using wrong interpolation after backend refactor #45258

Merged

thisisiron mentioned this pull request Apr 15, 2026

Add Deepseek-OCR-2 model #45075

Open

	"python": lambda: True, # Python backend is always available
	"numpy": lambda: True, # Python backend is always available

		return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors)


		class PilBackend(BaseImageProcessor):

	class MyImageProcessor(BaseImageProcessor):
	class MyImageProcessor(PilBackend):

Conversation

yonigozlan commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Image Processor Backend Refactor

Summary

New Structure

Loading Paths & Fallback Logic

Registering New / Custom Backends

Backward Compatibility

Other Changes

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Feb 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yonigozlan commented Jan 27, 2026 •

edited

Loading