vision_utils: clamp resize new_w/new_h to >=1#696
Conversation
The integer-rounding formula in _resize_images_inplace can produce new_w=0 or new_h=0 for inputs with extreme aspect ratios (e.g. 1024x1 with image_size=256: new_h = (1*256 + 512) // 1024 = 0). PIL.resize then raises "height and width must be > 0". This is reachable from Studio's vision_image_size knob when a dataset contains a degenerate image and the user picks a small cap. The fix mirrors the max(1, ...) guard already present in Studio's MLX resize helper (studio/backend/core/training/worker.py::_mlx_vlm_max_resized_size).
There was a problem hiding this comment.
Code Review
This pull request introduces a safety clamp to image resizing dimensions, ensuring they are at least 1 to prevent crashes in PIL.Image.resize when handling extreme aspect ratios. The reviewer suggested an optimization to cache the result of self.size_func(img) in a local variable to avoid redundant function calls and improve efficiency.
| new_w = max(1, (w * image_size + self.size_func(img) // 2) // self.size_func(img)) | ||
| new_h = max(1, (h * image_size + self.size_func(img) // 2) // self.size_func(img)) |
There was a problem hiding this comment.
While the max(1, ...) clamp correctly prevents a crash in PIL.Image.resize, the repeated calls to self.size_func(img) (up to 5 times per image in this block) are redundant. It is more efficient to store the result of self.size_func(img) in a local variable before calculating new_w and new_h.
current_size = self.size_func(img)
new_w = max(1, (w * image_size + current_size // 2) // current_size)
new_h = max(1, (h * image_size + current_size // 2) // current_size)Pure refactor on top of the existing clamp; output byte-identical for every input that has worked since the collator landed. - Hoist is_tuple, snap_to_patch_size, factor out of the per-image loop. - Cache size_func(img) so it is not called 3x per image. No behavioral change. Same code paths, same numeric output, same exception surface.
After the clamp, surface a one-shot UserWarning when the resized image would have aspect_ratio > 200 (MAX_RATIO). Qwen2-VL / Qwen2.5-VL preprocessors reject such inputs in their own smart_resize; without this warning users only see the downstream crash and have no signal that the issue is a degenerate-aspect training image. Non-degenerate inputs are unaffected (warning is gated on the same MAX_RATIO check zoo's own smart_resize already enforces).
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary
UnslothVisionDataCollator._resize_images_inplacecan producenew_w = 0ornew_h = 0for degenerate aspect ratios, causingPIL.resizeto raise `height and width must be > 0`.resize=256, resize_dimension=\"max\"evaluates tonew_h = (1 * 256 + 512) // 1024 = 0and crashes.max(1, ...)guard that already exists in Studio's MLX path (studio/backend/core/training/worker.py::_mlx_vlm_max_resized_size), bringing the two paths into parity.vision_image_sizeknob (Studio: expose image size setting in training UI unsloth#5743) when a dataset contains an image with one side small enough that the downscale ratio rounds it to 0. Realistic LaTeX_OCR / banner-style data (e.g. 1500x60, 5000x600) is unaffected; the crash only fires on truly degenerate inputs.Test plan
1024x1+resize=256against currentmain.256x1image.1500x60,5000x600,800x50) are byte-identical to pre-patch behavior.