huggingface · yonigozlan · Mar 19, 2026 · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -137,13 +137,15 @@ python utils/modular_model_converter.py <model_name>
 
 This will generate the separate files (`modeling_*.py`, `configuration_*.py`, etc.) from your modular file. The CI will enforce that these generated files match your modular file.
 
-☐ **2. Add a fast image processor (for image models)**
+☐ **2. Add image processors (for image models)**
 
-If your model processes images, implement a fast image processor that uses `torch` and `torchvision` instead of PIL/numpy for better inference performance:
+If your model processes images, implement both a torchvision-backed processor (the default, GPU-accelerated) and a PIL-backed processor (the alternative):
 
-- See the detailed guide in [#36978](https://github.com/huggingface/transformers/issues/36978)
-- Fast processors inherit from `BaseImageProcessorFast`
-- Examples: `LlavaOnevisionImageProcessorFast`, `Idefics2ImageProcessorFast`
+- The torchvision backend processor (`<Model>ImageProcessor`) inherits from `TorchvisionBackend` and lives in `image_processing_<model>.py`
+- The PIL backend processor (`<Model>ImageProcessorPil`) inherits from `PilBackend` and lives in `image_processing_pil_<model>.py`
+- Both are imported from `image_processing_backends`; the PIL kwargs class is defined in the torchvision file and imported by the PIL file
+- See the detailed guide in [IMAGE_PROCESSOR_REFACTORING_GUIDE.md](https://github.com/huggingface/transformers/blob/main/IMAGE_PROCESSOR_REFACTORING_GUIDE.md)
+- Examples: `CLIPImageProcessor` / `CLIPImageProcessorPil`, `DonutImageProcessor` / `DonutImageProcessorPil`
 
 ☐ **3. Create a weight conversion script**
 
@@ -225,7 +227,7 @@ Here's a condensed version maintainers can copy into PRs:
 Please ensure your PR completes all following items. See the [full checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#vision-language-model-contribution-checklist) for details.
 
 - [ ] **Modular file**: `modular_<model_name>.py` implemented and verified with `python utils/modular_model_converter.py <model_name>`
-- [ ] **Fast image processor**: Implemented using `BaseImageProcessorFast` (see [#36978](https://github.com/huggingface/transformers/issues/36978))
+- [ ] **Image processors**: Torchvision backend (`<Model>ImageProcessor` from `TorchvisionBackend`) and PIL backend (`<Model>ImageProcessorPil` from `PilBackend`) both implemented (see [IMAGE_PROCESSOR_REFACTORING_GUIDE.md](https://github.com/huggingface/transformers/blob/main/IMAGE_PROCESSOR_REFACTORING_GUIDE.md))
 - [ ] **Conversion script**: `convert_<model_name>_to_hf.py` added with usage examples
 - [ ] **Integration tests**: End-to-end tests with exact output matching (text or logits)
 - [ ] **Documentation**: Model docs added/updated in `docs/source/en/model_doc/`

diff --git a/docs/source/en/add_new_model.md b/docs/source/en/add_new_model.md
@@ -544,20 +544,16 @@ When both implementations have the same `input_ids`, add a tokenizer test file.
 ## Implement image processor
 
 > [!TIP]
-> Fast image processors use the [torchvision](https://pytorch.org/vision/stable/index.html) library and can perform image processing on the GPU, significantly improving processing speed.
-> We recommend adding a fast image processor ([`BaseImageProcessorFast`]) in addition to the "slow" image processor ([`BaseImageProcessor`]) to provide users with the best performance. Feel free to tag [@yonigozlan](https://github.com/yonigozlan) for help adding a [`BaseImageProcessorFast`].
+> Image processors now use a backend-based architecture. The default backend is [`TorchvisionBackend`], which uses the [torchvision](https://pytorch.org/vision/stable/index.html) library and can perform image processing on the GPU. A PIL/NumPy alternative backend ([`PilBackend`]) is also provided. Both backends are imported from `image_processing_backends`. Feel free to tag [@yonigozlan](https://github.com/yonigozlan) for help.
 
 While this example doesn't include an image processor, you may need to implement one if your model requires image inputs. The image processor is responsible for converting images into a format suitable for your model. Before implementing a new one, check whether an existing image processor in the Transformers library can be reused, as many models share similar image processing techniques. Note that you can also use [modular](./modular_transformers) for image processors to reuse existing components.
 
-If you do need to implement a new image processor, refer to an existing image processor to understand the expected structure. Slow image processors ([`BaseImageProcessor`]) and fast image processors ([`BaseImageProcessorFast`]) are designed differently, so make sure you follow the correct structure based on the processor type you're implementing.
+If you do need to implement a new image processor, each model has two processor files:
 
-Run the following command (only if you haven't already created the fast image processor with the `transformers add-new-model-like` command) to generate the necessary imports and to create a prefilled template for the fast image processor. Modify the template to fit your model.
+- `image_processing_<model>.py`: the **default** torchvision-backed processor (`<Model>ImageProcessor`), inheriting from [`TorchvisionBackend`]. This replaces the old "fast" processor.
+- `image_processing_pil_<model>.py`: the PIL/NumPy alternative processor (`<Model>ImageProcessorPil`), inheriting from [`PilBackend`]. This replaces the old "slow" processor.
 
-```bash
-transformers add-fast-image-processor --model-name your_model_name
-```
-
-This command will generate the necessary imports and provide a pre-filled template for the fast image processor. You can then modify it to fit your model's needs.
+The torchvision backend file also defines any custom kwargs class that the PIL file imports. Both files use the `@auto_docstring` decorator — do not add manual class docstrings. Refer to the [IMAGE_PROCESSOR_REFACTORING_GUIDE.md](https://github.com/huggingface/transformers/blob/main/IMAGE_PROCESSOR_REFACTORING_GUIDE.md) for a step-by-step walkthrough and complete examples.
 
 Add tests for the image processor in `tests/models/your_model_name/test_image_processing_your_model_name.py`. These tests should be similar to those for other image processors and should verify that the image processor correctly handles image inputs. If your image processor includes unique features or processing methods, ensure you add specific tests for those as well.
 

diff --git a/docs/source/en/image_processors.md b/docs/source/en/image_processors.md
@@ -16,10 +16,10 @@ rendered properly in your Markdown viewer.
 
 # Image processors
 
-Image processors converts images into pixel values, tensors that represent image colors and size. The pixel values are inputs to a vision model. To ensure a pretrained model receives the correct input, an image processor can perform the following operations to make sure an image is exactly like the images a model was pretrained on.
+Image processors convert images into pixel values, tensors that represent image colors and size. The pixel values are inputs to a vision model. To ensure a pretrained model receives the correct input, an image processor can perform the following operations to make sure an image is exactly like the images a model was pretrained on.
 
-- [`~BaseImageProcessor.center_crop`] to resize an image
-- [`~BaseImageProcessor.normalize`] or [`~BaseImageProcessor.rescale`] pixel values
+- center-crop or resize an image
+- normalize or rescale pixel values
 
 Use [`~ImageProcessingMixin.from_pretrained`] to load an image processors configuration (image size, whether to normalize and rescale, etc.) from a vision model on the Hugging Face [Hub](https://hf.co) or local directory. The configuration for each pretrained model is saved in a [preprocessor_config.json](https://huggingface.co/google/vit-base-patch16-224/blob/main/preprocessor_config.json) file.
 
@@ -44,70 +44,79 @@ This guide covers the image processor class and how to preprocess images for vis
 
 ## Image processor classes
 
-Image processors inherit from the [`BaseImageProcessor`] class which provides the [`~BaseImageProcessor.center_crop`], [`~BaseImageProcessor.normalize`], and [`~BaseImageProcessor.rescale`] functions. There are two types of image processors.
+Image processors use a backend-based architecture with two backends:
 
-- [`BaseImageProcessor`] is a Python implementation.
-- [`BaseImageProcessorFast`] is a faster [torchvision-backed](https://pytorch.org/vision/stable/index.html) version. For a batch of [torch.Tensor](https://pytorch.org/docs/stable/tensors.html) inputs, this can be up to 33x faster. [`BaseImageProcessorFast`] is not available for all vision models at the moment. Refer to a models API documentation to check if it is supported.
+- [`TorchvisionBackend`] — the default [torchvision-backed](https://pytorch.org/vision/stable/index.html) implementation. GPU-accelerated and up to 33x faster than the PIL backend for batches of [torch.Tensor](https://pytorch.org/docs/stable/tensors.html) inputs. All models support this backend; newer models only support this one.
+- [`PilBackend`] — the PIL/NumPy alternative. Portable and CPU-only. Only available for older models, where it is useful to reproduce the exact numerical outputs of the original implementation.
 
-Each image processor subclasses the [`ImageProcessingMixin`] class which provides the [`~ImageProcessingMixin.from_pretrained`] and [`~ImageProcessingMixin.save_pretrained`] methods for loading and saving image processors.
+The active backend on a loaded processor can be inspected with its `backend` attribute (e.g., `processor.backend == "torchvision"`). Each image processor subclasses [`ImageProcessingMixin`] which provides the [`~ImageProcessingMixin.from_pretrained`] and [`~ImageProcessingMixin.save_pretrained`] methods.
 
-There are two ways you can load an image processor, with [`AutoImageProcessor`] or a model-specific image processor.
+There are two ways you can load an image processor: with [`AutoImageProcessor`] or directly from a model-specific class.
 
 <hfoptions id="image-processor-classes">
 <hfoption id="AutoImageProcessor">
 
 The [AutoClass](./model_doc/auto) API provides a convenient method to load an image processor without directly specifying the model the image processor is associated with.
 
-Use [`~AutoImageProcessor.from_pretrained`] to load an image processor, and set `use_fast=True` to load a fast image processor if it's supported.
+Use [`~AutoImageProcessor.from_pretrained`] with the `backend` argument to select the backend. When `backend` is omitted (the default), torchvision is picked when it is installed and PIL is used otherwise. Note that `backend="pil"` is only supported for older models; newer models only expose the torchvision backend.
+
+> **Note:** a small set of older models (Chameleon, Flava, Idefics3, SmolVLM) use Lanczos interpolation that torchvision does not support, so they always default to the PIL backend regardless of torchvision availability. Pass `backend="torchvision"` explicitly to override this.
 
 ```py
 from transformers import AutoImageProcessor
 
-image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True)
+# Default: picks torchvision if available, otherwise pil
+image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
+
+# Explicitly request the torchvision backend
+image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", backend="torchvision")
+
+# Explicitly request the PIL backend (only for models that support it)
+image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", backend="pil")
 ```
 
 </hfoption>
 <hfoption id="model-specific image processor">
 
-Each image processor is associated with a specific pretrained vision model, and the image processors configuration contains the models expected size and whether to normalize and resize.
+Each image processor is associated with a specific pretrained vision model, and its configuration contains the model's expected size and normalization parameters.
 
-The image processor can be loaded directly from the model-specific class. Check a models API documentation to see whether it supports a fast image processor.
+Load the torchvision backend processor directly from the model-specific class.
 
 ```py
 from transformers import ViTImageProcessor
 
 image_processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
 ```
 
-To load a fast image processor, use the fast implementation class.
+For models that support it, you can load the PIL backend with the `Pil`-suffixed class. This is useful when you need exact numerical parity with the original implementation.
 
 ```py
-from transformers import ViTImageProcessorFast
+from transformers import ViTImageProcessorPil
 
-image_processor = ViTImageProcessorFast.from_pretrained("google/vit-base-patch16-224")
+image_processor = ViTImageProcessorPil.from_pretrained("google/vit-base-patch16-224")
 ```
 
 </hfoption>
 </hfoptions>
 
-## Fast image processors
+## Torchvision backend processors
 
-[`BaseImageProcessorFast`] is based on [torchvision](https://pytorch.org/vision/stable/index.html) and is significantly faster, especially when processing on a GPU. This class can be used as a drop-in replacement for [`BaseImageProcessor`] if it's available for a model because it has the same design. Make sure [torchvision](https://pytorch.org/get-started/locally/#mac-installation) is installed, and set the `use_fast` parameter to `True`.
+[`TorchvisionBackend`] is the **default** backend. Make sure [torchvision](https://pytorch.org/get-started/locally/#mac-installation) is installed, then load it with `backend="torchvision"` (or simply omit `backend`, since torchvision is selected automatically when available).
 
 ```py
 from transformers import AutoImageProcessor
 
-processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
+processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", backend="torchvision")
 ```
 
-Control which device processing is performed on with the `device` parameter. Processing is performed on the same device as the input by default if the inputs are tensors, otherwise they are processed on the CPU. The example below places the fast processor on a GPU.
+Control which device processing is performed on with the `device` argument. Processing is performed on the same device as the input by default if the inputs are tensors, otherwise it falls back to CPU. The example below runs processing on a GPU.
 
 ```py
 from torchvision.io import read_image
-from transformers import DetrImageProcessorFast
+from transformers import DetrImageProcessor
 
 images = read_image("image.jpg")
-processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
+processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
 images_processed = processor(images, return_tensors="pt", device="cuda")
 ```
 

diff --git a/docs/source/en/internal/import_utils.md b/docs/source/en/internal/import_utils.md
@@ -28,17 +28,17 @@ object for which you are lacking a dependency will error-out when calling any me
 This object is still importable:
 
 ```python
->>> from transformers import DetrImageProcessorFast
->>> print(DetrImageProcessorFast)
-<class 'DetrImageProcessorFast'>
+>>> from transformers import DetrImageProcessor
+>>> print(DetrImageProcessor)
+<class 'DetrImageProcessor'>
 ```
 
 However, no method can be called on that object:
 
 ```python
->>> DetrImageProcessorFast.from_pretrained()
-ImportError: 
-DetrImageProcessorFast requires the Torchvision library but it was not found in your environment. Check out the instructions on the
+>>> DetrImageProcessor.from_pretrained()
+ImportError:
+DetrImageProcessor requires the Torchvision library but it was not found in your environment. Check out the instructions on the
 installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
 Please note that you may need to restart your runtime after installation.
 ```