Skip to content

vision_utils: clamp resize new_w/new_h to >=1#696

Merged
danielhanchen merged 5 commits into
mainfrom
vision-resize-degenerate-clamp
May 27, 2026
Merged

vision_utils: clamp resize new_w/new_h to >=1#696
danielhanchen merged 5 commits into
mainfrom
vision-resize-degenerate-clamp

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

  • The integer-rounding formula in UnslothVisionDataCollator._resize_images_inplace can produce new_w = 0 or new_h = 0 for degenerate aspect ratios, causing PIL.resize to raise `height and width must be > 0`.
  • Repro: a 1024x1 image with resize=256, resize_dimension=\"max\" evaluates to new_h = (1 * 256 + 512) // 1024 = 0 and crashes.
  • The fix mirrors the max(1, ...) guard that already exists in Studio's MLX path (studio/backend/core/training/worker.py::_mlx_vlm_max_resized_size), bringing the two paths into parity.
  • Triggerable from Studio's new vision_image_size knob (Studio: expose image size setting in training UI unsloth#5743) when a dataset contains an image with one side small enough that the downscale ratio rounds it to 0. Realistic LaTeX_OCR / banner-style data (e.g. 1500x60, 5000x600) is unaffected; the crash only fires on truly degenerate inputs.

Test plan

  • Repro the crash with 1024x1 + resize=256 against current main.
  • Apply the patch; the same call now returns a 256x1 image.
  • Confirm aggregate matrix of 180 cases (9 caps x 25 weird shapes + boundary + multi-image + uncovered PIL modes + idempotence + real-world aspects) all pass with the patch.
  • Confirm realistic inputs (1500x60, 5000x600, 800x50) are byte-identical to pre-patch behavior.

The integer-rounding formula in _resize_images_inplace can produce
new_w=0 or new_h=0 for inputs with extreme aspect ratios (e.g. 1024x1
with image_size=256: new_h = (1*256 + 512) // 1024 = 0). PIL.resize
then raises "height and width must be > 0".

This is reachable from Studio's vision_image_size knob when a dataset
contains a degenerate image and the user picks a small cap. The fix
mirrors the max(1, ...) guard already present in Studio's MLX resize
helper (studio/backend/core/training/worker.py::_mlx_vlm_max_resized_size).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a safety clamp to image resizing dimensions, ensuring they are at least 1 to prevent crashes in PIL.Image.resize when handling extreme aspect ratios. The reviewer suggested an optimization to cache the result of self.size_func(img) in a local variable to avoid redundant function calls and improve efficiency.

Comment thread unsloth_zoo/vision_utils.py Outdated
Comment on lines +984 to +985
new_w = max(1, (w * image_size + self.size_func(img) // 2) // self.size_func(img))
new_h = max(1, (h * image_size + self.size_func(img) // 2) // self.size_func(img))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the max(1, ...) clamp correctly prevents a crash in PIL.Image.resize, the repeated calls to self.size_func(img) (up to 5 times per image in this block) are redundant. It is more efficient to store the result of self.size_func(img) in a local variable before calculating new_w and new_h.

                current_size = self.size_func(img)
                new_w = max(1, (w * image_size + current_size // 2) // current_size)
                new_h = max(1, (h * image_size + current_size // 2) // current_size)

Pure refactor on top of the existing clamp; output byte-identical for every input that has worked since the collator landed.

- Hoist is_tuple, snap_to_patch_size, factor out of the per-image loop.
- Cache size_func(img) so it is not called 3x per image.

No behavioral change. Same code paths, same numeric output, same exception surface.
After the clamp, surface a one-shot UserWarning when the resized image
would have aspect_ratio > 200 (MAX_RATIO). Qwen2-VL / Qwen2.5-VL
preprocessors reject such inputs in their own smart_resize; without
this warning users only see the downstream crash and have no signal
that the issue is a degenerate-aspect training image. Non-degenerate
inputs are unaffected (warning is gated on the same MAX_RATIO check
zoo's own smart_resize already enforces).
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen danielhanchen merged commit ac1e962 into main May 27, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant