Add fixed resize and pad strategy for object detection #30742

qubvel · 2024-05-10T13:27:51Z

What does this PR do?

Add a new strategy for image resizing and padding.

As it was discussed in #30422 (object detection examples), fixed resizing and padding strategies boost models' quality and allow models to converge faster for fine-tuning.

Currently, we make padding based on the maximum height and width in a batch, and image sizes vary from batch to batch leading to model metrics becoming batch-dependent.

To keep it simple and backward compatible, proposed to support a new size dict {"max_height": ..., "max_width": ...}. The image will be resized to the maximum size respecting the aspect ratio and keeping the height less or equal to max_height and the width less or equal tomax_width.

In terms of padding, implemented the following:
If max_height and max_width are provided in the size parameter, the image will be padded to themax_height and max_width dimensions. Otherwise, the image will be padded to the maximum height and width of the batch.

The padding strategy is not very explicit, probably it worth implementing separate keywords for padding. e.g. {"max_height": ..., "max_width": ..., "padded_height": ..., "padded_width" ...}. In that case, if "padded" keywords are provided in any size dict, then padding follows it, otherwise, we pad based on the maximum height and width of a batch.
(A small downside of it, is that in most cases "max_height" = "max_width" = "padded_height" = "padded_width", and we will need to specify 4 equal parameters.)

Who can review?

@amyeroberts @NielsRogge please let me know your opinion, I implemented it for DETR, if design is OK I will update all object detection models to support it.

HuggingFaceDocBuilderDev · 2024-05-10T13:51:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for working on this! Very nicely handled: backwards compatible; easy to control and well tested. It's going to be great to control the padding behaviour this way - should be faster too ❤️

Only thought is about consistency with other image processor. Some of them e.g. SAM have an explicit pad_size argument they use to control this behaviour. I think it's fine, as their resize method will already explicitly raise an exception if the user tries to pass through e.g. "max_size".

qubvel · 2024-05-16T11:30:04Z

@amyeroberts I decided to add the pad_size, as you mentioned, to make it consistent with resize and other image processors:

do_resize + size
do_pad + pad_size

image_processor = DetrImageProcessor(
   do_resize=True,
   size={"max_height": 300, "max_width": 100},
   do_pad=True,
   pad_size={"height": 350, "width": 150},
)

It requires a bit more code across different methods (init, preprocess, pad) but should be more explicit for users with more predictable behavior.

amyeroberts

Looks great - thanks for iterating!

Just a question about the addition of max_height and max_width. I don't feel super strongly, so happy to go with what you think it best.

amyeroberts · 2024-05-16T18:45:19Z

src/transformers/image_processing_utils.py

+    {"shortest_edge"},
+    {"shortest_edge", "longest_edge"},
+    {"longest_edge"},
+    {"max_height", "max_width"},


If the models are now using pad_size - do we need the max_size arguments here?

amyeroberts · 2024-05-16T18:51:54Z

src/transformers/models/yolos/image_processing_yolos.py

+                - `{"max_height": int, "max_width": int}`: The image will be resized to the maximum size respecting the
+                    aspect ratio and keeping the height less or equal to `max_height` and the width less or equal to
+                    `max_width`.


As max_size no longer controls the padding, I think we don't need to include this argument at the moment, but could add in a separate follow-up PR, as the default shortest_edge and longest_edge already provide bounds. Is it something you found to be useful when finetuning with the object detection script?

{"max_height": int, "max_width": int} provides a more controllable way of resizing, it is not necessary, but it allows us to use non-square pad_size.

For example, we have two images in a batch with shapes

[ [400, 600], [600, 400], ]

If we specify size={"shortest_edge": 200, "longest_edge": 300} we have to set pad_size={"height": 300, "width": 300} (square), because we don't know height or width will be resized to "largest_edge".

[ # original -> resized -> padded [400, 600] -> [200, 300] -> [300, 300] [600, 400] -> [300, 200] -> [300, 300] ]

If we specify size={"max_height": 200, "max_width": 300} we can set pad_size={"height": 200, "width": 300} (non-square).

[ # original -> resized -> padded [400, 600] -> [200, 300] -> [200, 300] [600, 400] -> [200, 133] -> [200, 300] ]

I don't think it is very common case, but might be useful in cases when you expect most of the images to have a particular orientation.

OK, let's keep it!

Life4Us2025

import math
import numpy as np
from PIL import Image, ImageOps
from dataclasses import dataclass
from enum import Enum
import warnings

class PadMode(Enum):
BOTTOM_RIGHT = "bottom_right" # classic
SYMMETRIC = "symmetric" # center align
RANDOM = "random" # stochastic placement
REFLECT = "reflect" # mirror pad

@DataClass
class ResizePadConfig:
resize_strategy: str = "shortest_edge" # height/width/shortest_edge/longest_edge/max_height/max_width
target_size: dict = None # e.g. {"height": 512, "width": 512}
pad_size: dict = None # fixed pad, e.g. {"height": 640, "width": 640}
pad_mode: PadMode = PadMode.BOTTOM_RIGHT
pad_value: int = 0 # background fill
preserve_aspect: bool = True
debug: bool = False
max_safe_scale: float = 4.0 # warn if resize >4x

class ResizePadEngine:
def init(self, cfg: ResizePadConfig):
self.cfg = cfg

def _debug_log(self, msg):
    if self.cfg.debug:
        print(f"[ResizePadEngine] {msg}")

def resize(self, img: Image.Image) -> Image.Image:
    """Resize with aspect ratio rules."""
    if self.cfg.target_size is None:
        return img

    w, h = img.size
    self._debug_log(f"Original size: {w}x{h}, strategy={self.cfg.resize_strategy}")

    ts = self.cfg.target_size

    if "height" in ts and "width" in ts:
        new_h, new_w = ts["height"], ts["width"]
        if not self.cfg.preserve_aspect:
            return img.resize((new_w, new_h), Image.BICUBIC)
        else:
            scale = min(new_w / w, new_h / h)
            new_w, new_h = int(w * scale), int(h * scale)
            return img.resize((new_w, new_h), Image.BICUBIC)

    elif "shortest_edge" in ts:
        scale = ts["shortest_edge"] / min(w, h)
    elif "longest_edge" in ts:
        scale = ts["longest_edge"] / max(w, h)
    elif "max_height" in ts and "max_width" in ts:
        scale = min(ts["max_height"] / h, ts["max_width"] / w)
    else:
        raise ValueError(f"Invalid target_size dict: {ts}")

    if scale > self.cfg.max_safe_scale:
        warnings.warn(f"Upscaling by {scale:.1f}x may hurt quality or waste memory.")

    new_w, new_h = int(w * scale), int(h * scale)
    self._debug_log(f"Resized to {new_w}x{new_h}")
    return img.resize((new_w, new_h), Image.BICUBIC)

def pad(self, img: Image.Image) -> Image.Image:
    """Apply padding to fixed size, with multiple strategies."""
    if self.cfg.pad_size is None:
        return img

    tgt_h, tgt_w = self.cfg.pad_size["height"], self.cfg.pad_size["width"]
    w, h = img.size

    if w > tgt_w or h > tgt_h:
        raise ValueError(f"Image {w}x{h} larger than pad_size {tgt_w}x{tgt_h}")

    pad_w, pad_h = tgt_w - w, tgt_h - h
    self._debug_log(f"Padding needed: (right={pad_w}, bottom={pad_h}) mode={self.cfg.pad_mode}")

    if self.cfg.pad_mode == PadMode.BOTTOM_RIGHT:
        padding = (0, 0, pad_w, pad_h)
    elif self.cfg.pad_mode == PadMode.SYMMETRIC:
        padding = (pad_w // 2, pad_h // 2, pad_w - pad_w // 2, pad_h - pad_h // 2)
    elif self.cfg.pad_mode == PadMode.RANDOM:
        left = np.random.randint(0, pad_w + 1)
        top = np.random.randint(0, pad_h + 1)
        padding = (left, top, pad_w - left, pad_h - top)
    elif self.cfg.pad_mode == PadMode.REFLECT:
        # Mirror pad using PIL.ImageOps
        img = ImageOps.expand(img, (0, 0, pad_w, pad_h), fill=None)
        return img
    else:
        raise ValueError(f"Unsupported pad_mode: {self.cfg.pad_mode}")

    return ImageOps.expand(img, padding, fill=self.cfg.pad_value)

def process(self, img: Image.Image) -> Image.Image:
    """Full resize+pad pipeline."""
    img = self.resize(img)
    img = self.pad(img)
    return img

=== Example Usage ===

if name == "main":
cfg = ResizePadConfig(
resize_strategy="shortest_edge",
target_size={"shortest_edge": 256},
pad_size={"height": 512, "width": 512},
pad_mode=PadMode.SYMMETRIC,
pad_value=128,
debug=True
)

engine = ResizePadEngine(cfg)
img = Image.open("sample.jpg")
out = engine.process(img)
out.save("processed.jpg")
print("Saved processed.jpg")

qubvel marked this pull request as draft May 10, 2024 13:28

amyeroberts reviewed May 13, 2024

View reviewed changes

qubvel force-pushed the add-resize-pad-strategy branch from 8f4b44b to 6682c34 Compare May 15, 2024 15:28

qubvel added 6 commits May 16, 2024 09:55

Add resize and pad strategy

09ceb8e

Merge get_size functions

fbbd97f

Add pad_size + tests to object detection models

95bcfd6

Fixup

47f4184

Update docstrings

7873779

Fixup

bd01808

qubvel force-pushed the add-resize-pad-strategy branch from 2eeacfe to bd01808 Compare May 16, 2024 09:56

qubvel marked this pull request as ready for review May 16, 2024 11:34

muellerzr mentioned this pull request May 16, 2024

Skip failing Object Detection example for now #30869

Closed

5 tasks

amyeroberts approved these changes May 16, 2024

View reviewed changes

qubvel merged commit bf646fb into huggingface:main May 17, 2024

qubvel mentioned this pull request May 22, 2024

Update object detection with latest resize and pad strategies #30955

Merged

3 tasks

qubvel mentioned this pull request May 29, 2024

New model support RTDETR #29077

Merged

9 tasks

Life4Us2025 reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fixed resize and pad strategy for object detection #30742

Add fixed resize and pad strategy for object detection #30742

Uh oh!

qubvel commented May 10, 2024

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2024

Uh oh!

amyeroberts left a comment

Uh oh!

qubvel commented May 16, 2024 •

edited

Loading

Uh oh!

amyeroberts left a comment

Uh oh!

amyeroberts May 16, 2024

Uh oh!

amyeroberts May 16, 2024

Uh oh!

qubvel May 16, 2024 •

edited

Loading

Uh oh!

amyeroberts May 17, 2024

Uh oh!

Life4Us2025 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add fixed resize and pad strategy for object detection #30742

Add fixed resize and pad strategy for object detection #30742

Uh oh!

Conversation

qubvel commented May 10, 2024

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

qubvel commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 16, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 16, 2024

Choose a reason for hiding this comment

Uh oh!

qubvel May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts May 17, 2024

Choose a reason for hiding this comment

Uh oh!

Life4Us2025 left a comment

Choose a reason for hiding this comment

=== Example Usage ===

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qubvel commented May 16, 2024 •

edited

Loading

qubvel May 16, 2024 •

edited

Loading