Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
f48a47b
remove attributes and add all missing sub processors to their auto cl…
yonigozlan Oct 15, 2025
d5d5c58
remove all mentions of .attributes
yonigozlan Oct 15, 2025
dd505b5
cleanup
yonigozlan Oct 15, 2025
6a1448f
fix processor tests
yonigozlan Oct 15, 2025
a292900
fix modular
yonigozlan Oct 15, 2025
63a255d
remove last attributes
yonigozlan Oct 16, 2025
ef73759
fixup
yonigozlan Oct 16, 2025
b5e8b2e
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 16, 2025
f14ff3c
fixes after merge
yonigozlan Oct 16, 2025
0306430
fix wrong tokenizer in auto florence2
yonigozlan Oct 16, 2025
01cb815
fix missing audio_processor + nits
yonigozlan Oct 17, 2025
49ec906
Override __init__ in NewProcessor and change hf-internal-testing-repo…
yonigozlan Oct 17, 2025
7dd5682
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 17, 2025
946cc5c
fix auto tokenizer test
yonigozlan Oct 17, 2025
b0cb3e0
add init to markup_lm
yonigozlan Oct 17, 2025
3b9e846
update CustomProcessor in custom_processing
yonigozlan Oct 17, 2025
53de7a4
remove print
yonigozlan Oct 17, 2025
93d2c4d
Merge branch 'main' into remove-attributes-from-processors
yonigozlan Oct 17, 2025
feeec28
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 22, 2025
4a6b080
nit
yonigozlan Oct 22, 2025
02402a0
Merge branch 'remove-attributes-from-processors' of https://github.co…
yonigozlan Oct 22, 2025
9204b4c
refactor processor tests first part
yonigozlan Oct 21, 2025
1ed7c56
refactor part 2
yonigozlan Oct 22, 2025
757e1f1
fix test modeling owlv2
yonigozlan Oct 22, 2025
bf763b2
fix test_processing_layoutxlm
yonigozlan Oct 22, 2025
0799a0a
Fix owlv2, wav2vec2, markuplm, voxtral issues
yonigozlan Oct 22, 2025
98ead2c
part3
yonigozlan Oct 23, 2025
59234ee
refactor all processor with mixin
yonigozlan Oct 23, 2025
54bf8e0
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Oct 23, 2025
bf1a4b6
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 31, 2025
e3f130d
add support for loading and saving multiple tokenizer natively
yonigozlan Oct 31, 2025
cc45a7e
remove exclude_attributes from save_pretrained
yonigozlan Oct 31, 2025
3810196
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Oct 31, 2025
34bfc74
get processor from pretrained instead of components in tests
yonigozlan Oct 31, 2025
a0c5c1a
skip tests in colqwen2, pixtral
yonigozlan Oct 31, 2025
8979645
modifs after review
yonigozlan Nov 7, 2025
6cc30f9
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Nov 7, 2025
447b598
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Nov 7, 2025
ac72ba2
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 7, 2025
d5bf14a
fix style and copies
yonigozlan Nov 7, 2025
773342b
Fix after review
yonigozlan Nov 11, 2025
12c854c
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 11, 2025
12a01fd
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 24, 2025
7d7c6b2
add test_processor_from_pretrained_vs_from_components, fix failing tests
yonigozlan Nov 24, 2025
fa94bcb
fix overflowing_tokens tests
yonigozlan Nov 24, 2025
74492e5
add config for layoutxlm
yonigozlan Nov 24, 2025
9bd9da1
fix ci
yonigozlan Nov 24, 2025
e4e36d9
use modular
yonigozlan Nov 24, 2025
1fd0cd5
fic docstring
yonigozlan Nov 24, 2025
1532913
Fix most tests
yonigozlan Nov 25, 2025
1c21d90
Standardize mgp_str tests
yonigozlan Nov 25, 2025
d931a2b
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 25, 2025
2e8003a
Merge branch 'simplify-processor-tests' into default-fast-image-proc-…
yonigozlan Nov 25, 2025
e26adb7
fix oneformer processing tests + fix copies
yonigozlan Nov 25, 2025
572b26d
fix after review
yonigozlan Nov 26, 2025
57fa154
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 26, 2025
2f6b12a
Merge branch 'simplify-processor-tests' into default-fast-image-proc-…
yonigozlan Nov 26, 2025
7769112
fix missing fet_images in fast image processors
yonigozlan Nov 26, 2025
3d48ee1
Merge remote-tracking branch 'upstream/main' into default-fast-image-…
yonigozlan Nov 26, 2025
7e0125a
Merge remote-tracking branch 'upstream/main' into default-fast-image-…
yonigozlan Dec 1, 2025
cabac7f
Merge branch 'main commit 377a8ee7' into default-fast-image-proc-all-…
ydshieh Dec 6, 2025
2fff041
fix 01 - to check
ydshieh Dec 6, 2025
1613f29
fix 02 - to check
ydshieh Dec 6, 2025
08249a2
fix 03 - to check
ydshieh Dec 6, 2025
6210f0e
fix 03 - to check
ydshieh Dec 6, 2025
a48b577
fix 03 - to check
ydshieh Dec 6, 2025
c29f9b0
fix 04 - to check
ydshieh Dec 6, 2025
e24559c
fix 05 - to check
ydshieh Dec 6, 2025
f04e642
fix 06 - sytle
ydshieh Dec 6, 2025
88610fc
fix 07 - revert
ydshieh Dec 6, 2025
6d56e56
Fix some errors
yonigozlan Dec 9, 2025
04d4145
Merge branch 'default-fast-image-proc-all-models' of https://github.c…
yonigozlan Dec 9, 2025
624aad6
Improve BatchFeature: stack list and lists of torch tensors (#42750)
yonigozlan Dec 12, 2025
587209c
fix remaining tests
yonigozlan Dec 12, 2025
21d2fc4
Merge remote-tracking branch 'upstream/main' into default-fast-image-…
yonigozlan Dec 18, 2025
1602059
fix copies
yonigozlan Dec 18, 2025
0bbe085
Fix Lfm2_vl im proc test
yonigozlan Dec 18, 2025
ff4e4c7
Merge remote-tracking branch 'upstream/main' into default-fast-image-…
yonigozlan Jan 21, 2026
aade130
nit after review
yonigozlan Jan 21, 2026
acdb89f
Merge branch 'main' into default-fast-image-proc-all-models
yonigozlan Jan 21, 2026
aca384b
Merge branch 'main' into default-fast-image-proc-all-models
yonigozlan Jan 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions src/transformers/image_processing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,6 @@
class BaseImageProcessor(ImageProcessingMixin):
valid_kwargs = ImagesKwargs

def __init__(self, **kwargs):
super().__init__(**kwargs)

@property
def is_fast(self) -> bool:
"""
Expand Down
57 changes: 44 additions & 13 deletions src/transformers/image_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -863,31 +863,43 @@ def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = Fals
paired_grouped_values[paired_index][shape].append(paired_value)
grouped_images_index[key] = (shape, len(grouped_images[shape]) - 1)

# Store structure size for nested inputs to handle empty sublists during reconstruction
if is_nested:
grouped_images_index["_num_sublists"] = len(normalized_images)

return grouped_images, *paired_grouped_values, grouped_images_index


def _reconstruct_nested_structure(indices, processed_images):
"""Helper function to reconstruct a single level nested structure."""
# Find the maximum outer index
max_outer_idx = max(idx[0] for idx in indices)

# Create the outer list
result = [None] * (max_outer_idx + 1)
# Get the number of sublists (handles empty sublists like in [[], [image]])
num_sublists = indices.pop("_num_sublists", None)

# Group indices by outer index
nested_indices = defaultdict(list)
for i, j in indices:
nested_indices[i].append(j)

# Determine the number of outer sublists
if num_sublists is not None:
max_outer_idx = num_sublists - 1
elif nested_indices:
max_outer_idx = max(nested_indices.keys())
else:
return []

# Create the result structure
result = []
for i in range(max_outer_idx + 1):
if i in nested_indices:
if i not in nested_indices:
result.append([])
else:
inner_max_idx = max(nested_indices[i])
inner_list = [None] * (inner_max_idx + 1)
for j in range(inner_max_idx + 1):
if (i, j) in indices:
shape, idx = indices[(i, j)]
inner_list[j] = processed_images[shape][idx]
result[i] = inner_list
for j in nested_indices[i]:
shape, idx = indices[(i, j)]
inner_list[j] = processed_images[shape][idx]
result.append(inner_list)
Comment on lines +883 to +902

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer!


return result

Expand All @@ -908,6 +920,21 @@ def _iterate_items(items, is_nested: bool):
yield i, item


def _get_device_from_images(images, is_nested: bool) -> "torch.device":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the processor is the one creating the torch tensor then I would suppose that there is a way to store the device in the data structure that it creates instead of having this function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly to avoid having to pass around a device argument to all group_image_by_shape calls, when it's easy to deduce it

"""
Get the device from the first non-empty element in a (potentially nested) list of images.

Handles cases like `images = [[], [image]]` where the first sublist may be empty.
"""
if is_nested:
for row in images:
if isinstance(row, torch.Tensor):
return row.device
if isinstance(row, list) and len(row) > 0:
return row[0].device
return images[0].device


def group_images_by_shape(
images: Union[list["torch.Tensor"], "torch.Tensor"],
*paired_inputs,
Expand Down Expand Up @@ -945,17 +972,21 @@ def group_images_by_shape(
"""
# If disable grouping is not explicitly provided, we favor disabling it if the images are on CPU, and enabling it otherwise.
if disable_grouping is None:
device = images[0][0].device if is_nested else images[0].device
device = _get_device_from_images(images, is_nested)
disable_grouping = device == "cpu"

if disable_grouping:
grouped_images_index = {key: (key, 0) for key, _ in _iterate_items(images, is_nested)}
if is_nested:
grouped_images_index["_num_sublists"] = len(images)

return (
{key: img.unsqueeze(0) for key, img in _iterate_items(images, is_nested)},
*[
{key: item.unsqueeze(0) for key, item in _iterate_items(paired_list, is_nested)}
for paired_list in paired_inputs
],
{key: (key, 0) for key, _ in _iterate_items(images, is_nested)},
grouped_images_index,
)

# Handle single level nested structure
Expand Down
27 changes: 14 additions & 13 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,14 @@

logger = logging.get_logger(__name__)


FORCE_FAST_IMAGE_PROCESSOR = ["Qwen2VLImageProcessor"]

# These image processors use Lanczos interpolation, which is not supported by fast image processors.
# To avoid important differences in outputs, we default to using the slow image processors for these processors.
DEFAULT_TO_SLOW_IMAGE_PROCESSORS = [
"ChameleonImageProcessor",
"FlavaImageProcessor",
"Idefics3ImageProcessor",
"SmolVLMImageProcessor",
]

if TYPE_CHECKING:
# This significantly improves completion suggestion performance when
Expand Down Expand Up @@ -535,24 +540,20 @@ def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs):
image_processor_auto_map = config.auto_map["AutoImageProcessor"]

image_processor_class = None
# TODO: @yoni, change logic in v4.52 (when use_fast set to True by default)
if image_processor_type is not None:
# if use_fast is not set and the processor was saved with a fast processor, we use it, otherwise we use the slow processor.
if use_fast is None:
use_fast = image_processor_type.endswith("Fast")
if not use_fast and image_processor_type in FORCE_FAST_IMAGE_PROCESSOR and is_torchvision_available():
use_fast = True
if (
not use_fast
and is_torchvision_available()
and image_processor_type not in DEFAULT_TO_SLOW_IMAGE_PROCESSORS
):
logger.warning_once(
f"The image processor of type `{image_processor_type}` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. "
"This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. "
"Note that this behavior will be extended to all models in a future release."
)
if not use_fast:
logger.warning_once(
"Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. "
"`use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. "
"This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`."
)
use_fast = True
if use_fast and not image_processor_type.endswith("Fast"):
image_processor_type += "Fast"
if use_fast and not is_torchvision_available():
Expand Down
9 changes: 9 additions & 0 deletions src/transformers/models/clip/image_processing_clip_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

from ...image_processing_utils_fast import BaseImageProcessorFast
from ...image_utils import OPENAI_CLIP_MEAN, OPENAI_CLIP_STD, PILImageResampling
from ...processing_utils import ImagesKwargs, Unpack
from ...utils import auto_docstring


Expand All @@ -34,5 +35,13 @@ class CLIPImageProcessorFast(BaseImageProcessorFast):
do_normalize = True
do_convert_rgb = True

def __init__(self, **kwargs: Unpack[ImagesKwargs]):
# for backwards compatibility of KOSMOS-2
if "use_square_size" in kwargs and kwargs["use_square_size"]:
kwargs["size"] = {"height": self.size["shortest_edge"], "width": self.size["shortest_edge"]}
kwargs.pop("use_square_size")

super().__init__(**kwargs)


__all__ = ["CLIPImageProcessorFast"]
14 changes: 11 additions & 3 deletions src/transformers/models/clipseg/processing_clipseg.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,22 @@ def __call__(self, text=None, images=None, visual_prompt=None, return_tensors=No
if text is not None and visual_prompt is not None:
raise ValueError("You have to specify exactly one type of prompt. Either text or visual prompt.")

output_kwargs = self._merge_kwargs(
self.valid_processor_kwargs, tokenizer_init_kwargs=self.tokenizer.init_kwargs, **kwargs
)

if text is not None:
encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
encoding = self.tokenizer(text, return_tensors=return_tensors, **output_kwargs["text_kwargs"])

if visual_prompt is not None:
prompt_features = self.image_processor(visual_prompt, return_tensors=return_tensors, **kwargs)
prompt_features = self.image_processor(
visual_prompt, return_tensors=return_tensors, **output_kwargs["images_kwargs"]
)

if images is not None:
image_features = self.image_processor(images, return_tensors=return_tensors, **kwargs)
image_features = self.image_processor(
images, return_tensors=return_tensors, **output_kwargs["images_kwargs"]
)

if visual_prompt is not None and images is not None:
encoding = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ def _prepare_images_structure(
**kwargs,
) -> ImageInput:
# we need to handle image pairs validation and flattening
images = self.fetch_images(images)
return flatten_pair_images(images)

def _preprocess(
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/fuyu/image_processing_fuyu_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
class FuyuImageProcessorFast(BaseImageProcessorFast):
do_resize = True
size = {"height": 1080, "width": 1920}
patch_size = {"height": 30, "width": 30}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gosh I remember this patch size. good default

resample = PILImageResampling.BILINEAR
do_pad = True
padding_value = 1.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ def _prepare_images_structure(self, images: ImageInput, expected_ndims: int = 3)
"""
Prepare a nested images structure for processing.
"""
images = self.fetch_images(images)
return make_nested_list_of_images(images, expected_ndims=expected_ndims)

def split_images(
Expand Down
32 changes: 29 additions & 3 deletions src/transformers/models/idefics3/image_processing_idefics3_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,27 @@ def get_max_height_width(images_list: list[list["torch.Tensor"]]) -> tuple[int,
return (max_height, max_width)


def get_num_channels(images_list: list[list["torch.Tensor"]]) -> int:
"""
Get the number of channels across all images in a batch. Handle empty sublists like in [[], [image]].
"""
for images in images_list:
if images:
return images[0].shape[0]

raise ValueError("No images found in the batch.")


def get_device_from_images(images_list: list[list["torch.Tensor"]]) -> "torch.device":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need this when extracting the device should be exactly the same for every single image processor no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pass nested images to group_image_by_shape, and some have structures with empty lists, so this is needed for edge cases

"""
Get the device from the first non-empty element in a nested list of images.
Handle empty sublists like in [[], [image]].
"""
for images in images_list:
if images:
return images[0].device


def make_pixel_mask(image: "torch.Tensor", output_size: tuple[int, int]) -> "torch.Tensor":
"""
Make a pixel mask for the image, where 1 indicates a valid pixel and 0 indicates padding.
Expand Down Expand Up @@ -183,11 +204,14 @@ class Idefics3ImageProcessorFast(BaseImageProcessorFast):
do_pad = True
return_row_col_info = False
valid_kwargs = Idefics3ImageProcessorKwargs
model_input_names = ["pixel_values", "pixel_attention_mask"]

def _prepare_images_structure(self, images: ImageInput, expected_ndims: int = 3) -> ImageInput:
"""
Prepare a nested images structure for processing.
"""
# Checks for `str` in case of URL/local path and optionally loads images
images = self.fetch_images(images)
return make_nested_list_of_images(images, expected_ndims=expected_ndims)

def resize(
Expand Down Expand Up @@ -438,18 +462,20 @@ def _preprocess(
# Get max images per batch
max_num_images = max(len(images_) for images_ in processed_images)
max_height, max_width = get_max_height_width(processed_images)
num_channels = get_num_channels(processed_images)
device = get_device_from_images(processed_images)

processed_images_padded = torch.zeros(
len(processed_images),
max_num_images,
*(processed_images[0][0].shape[0], max_height, max_width),
device=processed_images[0][0].device,
*(num_channels, max_height, max_width),
device=device,
)
pixel_attention_masks = torch.zeros(
len(processed_images),
max_num_images,
*(max_height, max_width),
device=processed_images[0][0].device,
device=device,
)
for i, images in enumerate(processed_images):
for j, image in enumerate(images):
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/janus/image_processing_janus_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,10 +217,10 @@ def postprocess(
if do_normalize and do_rescale and return_tensors == "PIL.Image.Image":
images = [F.to_pil_image(image) for image in images]

data = {"pixel_values": images}
return_tensors = return_tensors if return_tensors != "PIL.Image.Image" else None
images = torch.stack(images, dim=0) if return_tensors == "pt" else images

return BatchFeature(data=data, tensor_type=return_tensors)
return BatchFeature(data={"pixel_values": images}, tensor_type=return_tensors)


__all__ = ["JanusImageProcessorFast"]
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def _prepare_images_structure(
**kwargs,
) -> ImageInput:
# we need to handle image pairs validation and flattening
images = self.fetch_images(images)
return flatten_pair_images(images)

def _preprocess(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ def __init__(self, **kwargs: Unpack[LlavaOnevisionImageProcessorKwargs]):
def preprocess(self, images: ImageInput, **kwargs: Unpack[LlavaOnevisionImageProcessorKwargs]) -> BatchFeature:
if isinstance(images, (tuple, list)) and isinstance(images[0], (tuple, list)):
# if the first element is a list, we assume that all elements are lists
images = [x for x in images if x] # handle text-only case
batch_num_images = [len(x) for x in images]
elif isinstance(images, (tuple, list)):
# treat this as a single-image case for backward compatibility
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ def pad_to_square(
def preprocess(self, images: ImageInput, **kwargs: Unpack[LlavaOnevisionImageProcessorKwargs]) -> BatchFeature:
if isinstance(images, (tuple, list)) and isinstance(images[0], (tuple, list)):
# if the first element is a list, we assume that all elements are lists
images = [x for x in images if x] # handle text-only case
batch_num_images = [len(x) for x in images]
elif isinstance(images, (tuple, list)):
# treat this as a single-image case for backward compatibility
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@ class MllamaImageProcessorFast(BaseImageProcessorFast):
do_pad = True
max_image_tiles = 4
valid_kwargs = MllamaImageProcessorKwargs
model_input_names = ["pixel_values", "num_tiles", "aspect_ratio_ids", "aspect_ratio_mask"]

def __init__(self, **kwargs: Unpack[MllamaImageProcessorKwargs]):
super().__init__(**kwargs)
Expand Down
Loading