Studio: expose image size setting in training UI by Dariton4000 · Pull Request #5743 · unslothai/unsloth

Dariton4000 · 2026-05-23T19:30:25Z

One thing that has always bothered me was not being able to change/override the downscalling of images when training for image specific tasks to traide quality for speed or the other way around.. Now this is possible thanks to a new exposed "Image Size" option under the Memory tab in the Parameters page.

It has the following settings in the dropdown:

When Default is selected, nothing changes when compared to before this PR. Higher resolutions require more context, if not enough context is pressent, this Error message will appear upon starting training:

The setting is saved in the YAML training config. I have added tests.
I ran multiple training runs on my NVIDIA GPU to test its robustness using qwen3.5 and gemma4 models using LoRA, QLoRA and Full Finetune. I tested MLX but I did not do a full training run with it. I tested the changes in WSL (Ubuntu).
A custom implementation for MLX was required as I did not find any relevant documentation on any backend features already implementing this. :)

Studio vision fine-tuning had no explicit way to cap image resolution, so users could not trade visual detail against context and memory use from the training UI, YAML config, or API payload. :) Add a nullable `vision_image_size` setting that keeps the current model default when unset and applies a max-side resize when provided. - Add `vision_image_size` to the training request model, route payload, backend training config, and frontend API/types plumbing. - Validate the value server-side as either null or an integer in the supported 256-2048 range. - Surface an Image Size selector for vision LoRA training with Default plus common preset sizes. - Include the value in training start payloads only for image-dataset vision models, and serialize it into vision-aware YAML configs. - Map backend model defaults back into the training store and reset the value when reapplying model defaults. - Pass the resize through the Torch trainer via `UnslothVisionDataCollator` using max-dimension semantics. - Apply the same max-dimension resize in the MLX VLM path before mlx-vlm's internal collation, preserving aspect ratio and avoiding upscaling. - Add backend validation coverage and MLX resize-size tests for the new behavior.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request adds support for configurable vision image size in VLM training, including backend resizing logic for MLX and Unsloth, parameter validation, and frontend UI updates. Review feedback recommends unconditionally converting images to RGB mode to handle diverse input formats and prevent processing errors, regardless of whether resizing occurs.

…rray - trainer.py: DeepSeek OCR collator now honors the new vision_image_size setting as image_size. Falls back to 640 when null. base_size stays at 1024 and crop_mode stays True so the Gundam preset's dynamic cropping of large documents keeps working. - worker.py: _resize_mlx_vlm_image returns np.array(image, copy=True) instead of np.asarray(image). The PIL view from np.asarray is not writable, which makes HF VLM processors emit "The given NumPy array is not writable, and PyTorch does not support non-writable tensors..." when they call torch.from_numpy. copy=True keeps the same shape and dtype but produces a writable buffer.

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-24T13:47:55Z

Pushed two small follow-up fixes on top of your branch (dd1a209):

1. DeepSeek OCR now honors vision_image_size (studio/backend/core/training/trainer.py)

The is_deepseek_ocr branch was still constructing DeepSeekOCRDataCollator(image_size=640, ...) regardless of the user-selected Image Size, so the new control was a no-op for DeepSeek OCR models even though the UI showed it. The fix maps vision_image_size to image_size (per-crop tile size) and keeps base_size=1024 + crop_mode=True so the Gundam preset's dynamic cropping of large documents keeps working. None falls back to the original 640.

2. Writable ndarray on the MLX path (studio/backend/core/training/worker.py)

np.asarray(image) on a PIL image returns a view of PIL's read-only buffer. When mlx-vlm hands that to HF processors that call torch.from_numpy, it emits UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors.... Switched to np.array(image, copy=True) which is the same shape and dtype but a writable backing buffer.

Both changes leave the vision_image_size is None path byte-identical to the pre-fix behavior. The 25 in-tree tests still pass.

Two small nits I did not change but worth considering in a follow-up:

serializeConfigToYaml is called with only store.isVisionModel from training-section.tsx, while the API mapper additionally gates on isDatasetImage === true. So a vision model with a text-only dataset can export a YAML containing vision_image_size that the actual training payload would have sent as null. Aligning the YAML gate to isVisionModel && isDatasetImage === true would remove that drift.
visionImageSizePresets is [384, 512, 768, 1024, 1536, 2048] but the backend validator allows [256, 2048]. The PR description screenshot shows 256 as a preset, so either adding 256 to the dropdown or raising _MIN_VISION_IMAGE_SIZE to 384 would make the two ranges line up.

Thanks for the PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64d7e7897d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T13:52:36Z

+    vision_image_size:
+      config.isVisionModel && config.isDatasetImage === true
+        ? config.visionImageSize
+        : null,


Preserve vision_image_size when dataset probe is inconclusive

The payload currently drops vision_image_size unless isDatasetImage === true, so any transient/failed dataset-format check (which leaves isDatasetImage as null) silently forces vision_image_size to null even after the user selected a value. In that case training still proceeds, but the new image-size setting is ignored and backend falls back to model default, making this feature unreliable for valid vision datasets when detection is inconclusive.

Useful? React with 👍 / 👎.

…opdown - training-section.tsx: handleSaveConfig now passes isVisionModel && isDatasetImage === true to serializeConfigToYaml, matching buildTrainingStartPayload. Stops vision_image_size from leaking into exported YAML for text-only datasets where the API would have sent null. - params-section.tsx: add 256 to visionImageSizePresets so the dropdown spans the validator's full [256, 2048] range. Also render a synthetic SelectItem for the current value when it was loaded from YAML or model defaults and is not in the preset list, so the controlled Select always shows the active size.

mapBackendModelConfigToTrainingPatch now mirrors the backend validator at studio/backend/models/training.py:169 by dropping any value that is not an integer in [256, 2048]. Pre-fix, an imported YAML like vision_image_size: 4096 or 640.5 would land in the store and the UI would happily display it, only to fail when Start Training posted to the backend. With this guard the store never holds a value the backend would reject.

chatgpt-codex-connector · 2026-05-24T14:06:40Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Switch the field_validator to mode="before" so True/False surface as bool (not Pydantic's coerced 1/0) and give a precise "must be an integer or null" message instead of the misleading "must be in [256, 2048] (got 1)". Also explicitly accepts numpy Integral and integral Real scalars so YAML or programmatic callers using numpy ints keep working.

chatgpt-codex-connector · 2026-05-24T14:09:43Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:09:57Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Regression guard for the validator switch to mode="before". Pre-fix, vision_image_size: True was rejected with "must be in [256, 2048] (got 1)" because Pydantic coerced before our check ran. New test asserts the message now reads "integer or null".

chatgpt-codex-connector · 2026-05-24T14:10:37Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-24T14:13:15Z

Pushed four more follow-up commits on top. Full set of follow-ups now applied:

Commit	What
`dd1a209`	DeepSeek OCR collator honors `vision_image_size` as `image_size` (`base_size` and `crop_mode` unchanged so the Gundam preset's dynamic cropping keeps working); writable MLX ndarray via `np.array(image, copy=True)`.
`bef7f54`	YAML export gate aligned to `isVisionModel && isDatasetImage === true`; added `256` to the dropdown presets; synthetic `SelectItem` for loaded values that are not in the preset list.
`d0bfb11`	`mapBackendModelConfigToTrainingPatch` validates `vision_image_size` to the same range as the backend (256-2048 integer); invalid YAML/model-default values are dropped so the store and UI never show a value the API would reject.
`df333a77`	Pydantic `field_validator` switched to `mode="before"` so `True`/`False` produce the precise message `vision_image_size must be an integer or null` instead of `must be in [256, 2048] (got 1)`. Also accepts numpy `Integral` and integral `Real`.
`08ba6ca`	Regression test that pins the new error wording.

Total surface area: 7 files, +81/-16. All 27 in-tree backend tests pass; frontend typecheck and vite build pass; the Select component is verified across Chromium, Firefox, and WebKit headless.

Default (vision_image_size = None) remains byte-identical to the pre-PR behavior on every code path. Verified empirically: there is no MLX memory regression from the copy=True writability change (PIL forces materialization either way; measured delta is under 2 MB versus the prior np.asarray view at the same RSS peak).

Thanks for the PR.

Round 2 of follow-up review surfaced three usability issues: - model-defaults.ts: switching to a model whose backend YAML omits vision_image_size now explicitly resets the store value to null. Pre-fix, a stale 2048 from a previous model would silently apply to the new run because every checked-in model-default file omits the key. - training-section.tsx: handleSaveConfig now includes vision fields unless isDatasetImage is definitively false. isDatasetImage is null during dataset checks, after dataset edits, and on import; treating unknown as "drop" would silently lose the user's selection in those windows. Confirmed-text-only datasets still drop the value. - worker.py: _mlx_vlm_max_resized_size now mirrors the Torch collator's integer formula (w * size + size_func // 2) // size_func instead of Python round(), which uses banker's rounding and disagreed by 1px on half-pixel inputs like 333x1000 with target 500 (was 166, now 167). Test_mlx_training_worker_config gains parity assertions.

chatgpt-codex-connector · 2026-05-24T14:57:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-24T14:58:02Z

Round 2 of the parallel-reviewer pass surfaced three more issues. Pushed 473eab87 with the fixes:

Model switch keeps stale image size. Switching to a model whose backend YAML omits vision_image_size left the previous model's 2048 (or whatever) in the store, so the new run silently trained at the old size. mapBackendModelConfigToTrainingPatch now resets to null when the key is missing.
YAML save drops the value during dataset-check windows. My prior gate alignment (isDatasetImage === true) was too strict: isDatasetImage is null while the dataset check is running, after dataset edits, and on import, so the user could save a config and lose the selection. Loosened to isDatasetImage !== false. Confirmed-text-only still drops; unknown preserves.
MLX rounding parity with the Torch collator. _mlx_vlm_max_resized_size was using Python round() which is banker's rounding and disagreed with UnslothVisionDataCollator's integer formula by 1 pixel on inputs like 333x1000 -> 500. Switched to the same (w * size + size_func // 2) // size_func formula. Added parity assertions.

Total surface area now: 8 files, +113/-22. Twenty-nine in-tree backend tests pass.

chatgpt-codex-connector · 2026-05-24T15:11:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

mapBackendModelConfigToTrainingPatch resets stale image size on the success path, but if the /api/models/config endpoint throws, training-config-store.ts falls through to checkVisionModel and only updates capability flags. Pre-fix that left a stale 2048 (or any prior selection) in the store, so once dataset detection marked the new dataset as image, the next training start would silently apply the previous model's size. The error branch now also resets to the DEFAULT_HYPERPARAMS.visionImageSize sentinel.

chatgpt-codex-connector · 2026-05-24T15:11:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Round 4 of the parallel-reviewer pass flagged that the Torch trainer exclusion I added did not have a matching MLX guard, and that the UI still offered the dropdown for DeepSeek OCR even though the backend ignores it. - worker.py: _run_mlx_training now mirrors the Torch exclusion. When the model name matches DeepSeek OCR, vision_image_size is forced back to None before _adapt_for_mlx_vlm sees it, so dataset images pass through unchanged just like the Torch path. Emits a clear status line when this happens. - params-section.tsx: the Image Size Row is now gated on showVisionImageSize (showVisionLora && !isDeepseekOcr) instead of showVisionLora alone, so DeepSeek OCR users no longer see a control that silently has no effect. - mappers.ts: buildTrainingStartPayload sends null for vision_image_size whenever the selected model is DeepSeek OCR, so the backend log line about ignoring the value never fires from a UI-driven start.

chatgpt-codex-connector · 2026-05-24T15:51:28Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T15:51:41Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-24T15:52:01Z

Round 4 of the parallel-reviewer pass: an MLX/Torch symmetry gap. Pushed c797e571 with three matching fixes:

MLX path now skips the resize for DeepSeek OCR. _run_mlx_training mirrors the Torch trainer's exclusion: when the model name matches DeepSeek OCR, vision_image_size is forced back to None before _adapt_for_mlx_vlm runs, so MLX leaves DeepSeek OCR images untouched too. Emits a status line when this happens.
UI hides the Image Size dropdown for DeepSeek OCR. params-section.tsx gates the Row on showVisionImageSize = showVisionLora && !isDeepseekOcr. Users no longer see a control that has no effect.
API mapper drops the value for DeepSeek OCR. mappers.ts sends null for vision_image_size whenever the selected model is DeepSeek OCR, so the backend "ignored" log line never fires from a UI-driven start.

Nine files changed across all my follow-ups, +156/-31 net. 27 backend tests pass; frontend typecheck and vite build pass.

danielhanchen · 2026-05-24T16:07:49Z

Round 5 of the parallel-reviewer pass came back 12/12 APPROVE with zero findings. The PR has converged.

Full set of follow-ups pushed on top of the original commit:

Commit	What
`dd1a2097`	DeepSeek OCR initial threading + writable MLX ndarray
`bef7f540`	YAML export gate aligned to API mapper, 256 preset, custom SelectItem
`d0bfb11e`	YAML/model-default loader mirrors backend `[256, 2048]` integer validator
`df333a77`	Pydantic `mode="before"` validator for precise error messages, numpy Integral support
`08ba6ca0`	Regression test that pins the new error wording
`6644fb46`	Reset `vision_image_size` in the model-config error fallback path
`473eab87`	Round-2 loosen YAML save gate, integer rounding parity with Torch collator
`8b44ae3c`	Round-3 revert DeepSeek OCR knob (unsafe with collator preset tuple), move missing-key reset to `setSelectedModel`
`c797e571`	Round-4 extend DeepSeek OCR exclusion to MLX path + hide UI for it + drop in API mapper

Net: 9 commits, 9 files touched, +156/-31 lines. Twenty-seven backend tests pass; frontend typecheck and vite build pass; cross-browser smoke green on Chromium, Firefox, and WebKit.

Default behavior (vision_image_size = None) remains byte-identical to pre-PR on every code path.

Thanks again for the PR.

Two YAML-path asymmetries that could leak a stale image size into training: - parseYamlConfig now treats a missing training.vision_image_size as null. Without this, importing a YAML saved before this feature (or any config that omits the key) preserved whatever value the user had previously set on a different model. The model-defaults reload path still uses Object.hasOwn so same-model defaults reloads do not wipe a manual selection; only file import normalises the missing key. - handleSaveConfig now passes a DeepSeek-OCR-specific guard to serializeConfigToYaml so saved YAML matches what the API mapper actually sends. Previously a state with visionImageSize set could emit the key even though Studio ignored it at training time for DeepSeek OCR, and a later import for a non-DeepSeek vision model would activate the stale value. serializeConfigToYaml gains an optional third parameter includeVisionImageSize defaulting to includeVisionFields, preserving the existing 2-arg call signature for backwards compatibility.

danielhanchen · 2026-05-25T08:13:02Z

Round 9 of reviewer.py (12 parallel personas) on head eea71c9e came back 11 APPROVE + 1 REQUEST_CHANGES from the simulation persona, which surfaced two YAML-path asymmetries that earlier rounds did not exercise. Fixed in 89ae87c1:

parseYamlConfig: a YAML config that omits training.vision_image_size now injects null at the parser level so applyConfigPatch resets a stale in-memory value to Default on file import. The model-defaults reload path still uses Object.hasOwn in mapBackendModelConfigToTrainingPatch, so same-model defaults reloads do not wipe a manual selection. Only YAML file import normalises the missing key.
handleSaveConfig: passes a DeepSeek-OCR-specific guard to serializeConfigToYaml so the saved YAML matches what the API mapper actually sends. Previously a state where visionImageSize was set could emit training.vision_image_size even though Studio ignores it at training time for DeepSeek OCR, and a later import of that YAML against a non-DeepSeek vision model would activate the stale value.

serializeConfigToYaml gains an optional third parameter includeVisionImageSize defaulting to includeVisionFields, preserving the existing 2-arg call signature.

Verification on head 89ae87c1:

17/17 new YAML round-trip + DeepSeek-OCR-save sims pass (sim_yaml_round9.mjs).
Existing sim_yaml_roundtrip.js, sim_yaml_gate.js, sim_dropdown_custom_item.js, sim_model_switch_reset.js all still green.
tsc -b clean, vite build clean.
Backend pytest 27/27 (unchanged path).

Studio end-to-end Playwright smoke against the converged pre-round-9 head also captured: VLM model + image dataset shows the Image Size dropdown on the Memory tab, opening + selecting 512 + back to Default work, and switching to a text-only model hides the dropdown.

Round 9's parseYamlConfig normalization only fired when the YAML had a training mapping that omitted vision_image_size. A lora-only or logging-only YAML (or one with `training: null`) still left trainingObj unset, the mapper saw no vision_image_size key, and the previously selected store value persisted into the next training run. Now an absent or null training section is synthesised as { vision_image_size: null } so model-defaults.ts always patches visionImageSize back to Default on file import. Same-model defaults reloads still preserve manual choices via the existing Object.hasOwn gate in mapBackendModelConfigToTrainingPatch.

danielhanchen · 2026-05-25T08:34:33Z

Round 10 (12 reviewers on 89ae87c1) returned 11 APPROVE + 1 REQUEST_CHANGES, with three reviewers independently flagging the same hole in the round-9 P1 fix: parseYamlConfig only injected vision_image_size: null when the YAML had a training: mapping. A lora-only or logging-only YAML (or training: null) left trainingObj unset, the mapper saw no key, and the previously selected visionImageSize persisted.

Fixed in b90d1026: an absent or null training section is now synthesised as { vision_image_size: null } so model-defaults always patches visionImageSize back to Default on file import. Same-model defaults reloads still preserve manual choices via the existing Object.hasOwn gate in mapBackendModelConfigToTrainingPatch.

Extended sim_yaml_round9.mjs: 20/20 cases pass (added lora-only, logging-only, training: null). Typecheck and build clean.

danielhanchen · 2026-05-25T08:50:27Z

Round 11 (12 reviewers on b90d1026) returned 12/12 APPROVE, zero findings. Reconverged after the two round-9 P1/P2 fixes plus the round-10 lora-only follow-up.

CI on b90d1026: pre-commit.ci pending as usual.

Summary of the round-9 + round-10 follow-ups since the earlier convergence at eea71c9e:

SHA	Title
`89ae87c1`	Studio: tighten YAML import/save for vision_image_size
`b90d1026`	Studio: also reset vision_image_size when YAML lacks a training section

Net additional surface area: 2 files, +42/-3 lines on top of the prior converged state. Aggregate ~720+ simulation assertions and 27 backend pytest still green.

danielhanchen · 2026-05-25T11:51:31Z

While re-testing this PR across every preset image size × every plausible image shape (1x1 pixel, 14x14 Qwen-patch, 8192x1 extreme aspect, 4k/8k squares, prime sides, off-by-one from powers of 2, A4/HD/cinema/iPhone aspects, LaTeX_OCR-style bands, real Qwen3-VL + Gemma-3 processors), I found one latent bug in unsloth_zoo, not in this PR:

unsloth_zoo.UnslothVisionDataCollator._resize_images_inplace lacks a max(1, ...) clamp on the rounded new_w / new_h. With degenerate aspect ratios (e.g. 1024x1 + cap=256), the integer-rounding formula returns new_h=0 and PIL.resize raises height and width must be > 0. The PR's MLX path already has the clamp; only the Torch path is missing it.

Reachable from this PR's vision_image_size knob when a dataset contains a row-of-pixels image with one dimension small enough that the downscale ratio rounds it to zero. Realistic LaTeX_OCR (1500x60), banner (5000x600), and document (800x50) images are unaffected. Only inputs with short_side * cap < max_side / 2 crash, which means roughly short_side <= 3 with cap=256, or short_side <= 1 with cap=1024.

Submitted the one-line fix to unsloth-zoo as unslothai/unsloth-zoo#696. After that PR lands, the same matrix passes 180/180.

Aggregate green count on PR 5743 head b90d1026 + zoo PR #696:

Suite	Cases	Result
backend pytest	27	27/27 PASS
sim_backend_validator	42	OK
sim_kwargs_propagation	10	OK
sim_yaml_round9	20	ALL OK
sim_yaml_hostile	67	67/67 OK
sim_all_sizes_pydantic_mlx	137	137/137 OK
sim_studio_catalog_audit	78 catalog + 7 probes	OK
pr5743_collator_smoke	5	OK
pr5743_mlx_resize	9	OK
pr5743_matrix_all_sizes	92	92/92 OK
pr5743_matrix_real_processors	58	58/58 OK (Qwen3-VL + Gemma-3-4b)
pr5743_weird_image_sizes	180	180/180 OK (was 166/180 before zoo fix)
pr5743_weird_sizes_real_procs	510	393 PASS, 117 SKIP (Gemma processor rejects Nx1 shape), 0 FAIL
Playwright cross-browser	6	6/6 PASS (Chromium/FF/WebKit, Select + YAML)

The 117 SKIPs are downstream HF Gemma image-processor limitations on shape (N, 1) (channel detection ambiguity, "mean must have 1 elements"). Qwen3-VL's processor accepts all of them. These are not in this PR's surface area.

Net: zero regressions traceable to PR 5743 across ~1610 assertions covering every preset size, every shape, every cap × image-side boundary, every PIL mode, real Qwen3-VL and Gemma-3 processors, the full 78-entry Studio model catalog, and Chromium/Firefox/WebKit.

A fresh static review (Opus subagent) flagged P3-1: parseYamlConfig only synthesised vision_image_size: null when raw.training was either absent or a plain object missing the key. If raw.training is a scalar or an array (malformed but still parseable), the value was passed through unchanged, the mapper's Object.hasOwn returned false, and any previously selected visionImageSize persisted - the same stale-state leak the lora-only fallback was added to close. Treat any non-plain-object raw.training (null, array, scalar) as a malformed/missing section and reset to { vision_image_size: null }.

danielhanchen · 2026-05-25T12:21:37Z

Spawned two fresh Opus subagents to re-audit this PR from scratch in parallel.

Static + security audit (Opus, 17 files, no tool fast path)

Found 1 P1 (already fixed via unsloth-zoo#696), 2 P2 (one already empirically dismissed by the runtime audit's 2,000-case Torch/MLX equivalence sweep, one acknowledged UX trade-off documented in the existing code comment), and 4 P3 nits. Categories where the auditor explicitly confirmed "no findings": backwards compat with vision_image_size = None (byte-identical behavior verified end-to-end), Pydantic validator under numpy/Decimal/Fraction/bool/strict-mode/JSON-null-vs-missing-key edge cases, DeepSeek-OCR exclusion symmetry across all four sites (UI hide / payload drop / YAML save drop / MLX worker log+ignore), data flow (no silent mutation or shadowing), security / injection (no string interpolation, no YAML eval, no path use, no sandbox escape), model-switch + same-model-reload + error-fallback paths, race conditions across async ordering, frontend gate consistency, and scope creep (all 17 changed files relate to the feature).

Acted on P3-1 (cheap unification) in 14caf36a: parseYamlConfig now resets vision_image_size to null whenever raw.training is anything other than a plain object - arrays, scalars, numbers, booleans, in addition to the previously handled null / missing case. This closes the last variant of the stale-state leak that the lora-only fallback was originally added to address. Verdict from the static auditor: APPROVE conditionally on landing unsloth-zoo#696 first.

Runtime + integration audit (Opus, real GPU + real models + 3,200+ fresh assertions)

Designed and ran six brand-new test scripts in addition to re-validating the existing 21-harness battery:

Test	Coverage	Result
Pydantic boundary fuzz	1,105 randomized inputs (ints, floats, strings, containers, bytes, numpy scalars, dataclasses, int subclasses, IntLike duck types, complex, nan/+inf/subnormals, unicode digits, locale separators, sci-notation, hex strings, sign/whitespace coercion)	1,105/1,105 PASS, 0 contract violations
Real multi-image per turn × Qwen3-VL processor	5 presets × 5 mixed-size image sets per user turn against the actual `unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit` processor	20/25 PASS, 5 expected processor-side rejects on `aspect_ratio > 200` (pre-existing HF Qwen2-VL invariant, unrelated to this PR)
Real 5-step LoRA training	`unsloth/gemma-3-4b-it-unsloth-bnb-4bit` + 16 rows of `unsloth/LaTeX_OCR` + `UnslothVisionDataCollator(resize=PRESET, resize_dimension="max")` × presets 256 / 512 / 1024	15/15 finite training steps, losses descended `6.27 → 2.52` for preset 256; presets 512 and 1024 produced BIT-IDENTICAL losses for LaTeX_OCR's <512px images, empirically proving the "Default = pre-PR" contract
Full-schema YAML round-trip fuzz	50 randomized `TrainingConfigState` snapshots through `serializeConfigToYaml` -> bytes -> `parseYamlConfig` -> `mapBackendModelConfigToTrainingPatch` -> `buildTrainingStartPayload`, covering DeepSeek-OCR exclusion, text-only models, `isDatasetImage=null/true/false`, numeric/string/typed/whitespace/+-prefixed/float `visionImageSize` forms	50/50 PASS
Torch <-> MLX pixel-resize equivalence sweep	2,000 random `(w, h, target)` triples + known banker's-rounding edge cases like `(333, 1000, 500)`, plus 60 idempotence and 6 writability assertions, plus None-identity, plus non-PIL passthrough	2,000/2,000 pixel-equivalent + 60/60 idempotence + 6/6 writability
Unicode + homoglyph payload paths	CJK / Arabic / emoji / RTL-marker / Cyrillic-lookalike / supplementary-plane chars in `model_name` and `hf_dataset`, with JSON round-trip, plus a Cyrillic-`dееpseek` homoglyph-bypass check against the DeepSeek-OCR substring guard	6/6 PASS, Cyrillic `dееpseek` correctly does NOT bypass the case-insensitive `deepseek` match while all legitimate casings of `DeepSeek-OCR` do match

Runtime auditor verdict: APPROVE. Zero functional bugs across ~3,200 fresh assertions. The validator is robust against scale-1000 random fuzz with no uncaught exceptions and no out-of-contract acceptances. Torch and MLX resize formulas are pixel-identical for 2,000 random inputs including known banker's-rounding edge cases. End-to-end VLM training with the new knob completes with stable, finite losses on real data. The DeepSeek-OCR guard resists Cyrillic homoglyph spoofing while matching all legitimate casings.

Updated aggregate green count on head `14caf36a` (with unsloth-zoo PR #696 applied)

Backend pytest 27/27 + Pydantic fuzz 1,105/1,105 + 50-case YAML round-trip + 2,000-case Torch/MLX equivalence + 180-case weird-shapes + 92-case all-sizes synthetic + 58-case real-VLM-processor + 137-case Pydantic+MLX + 78-entry catalog audit + 70-case YAML hostile + 25-case multi-image + 15 real training steps + 6 unicode payload + 6 cross-browser + 5 collator smoke + 9 MLX resize + ... = ~4,800 assertions across 23 harnesses, all green.

Both audits APPROVE. PR continues to be safe to merge once unsloth-zoo#696 lands.

danielhanchen · 2026-05-25T13:17:03Z

Tightened code comments across all 10 touched Studio files. Net -34 comment lines (61 removed, 27 added). 5-9 line essays compressed to 1-3 lines each, keeping the WHY where non-obvious and dropping restated mechanical details (line numbers, paraphrases of the code beside them, history of prior reviewer rounds). Full battery still green: 27/27 pytest, all sims OK, 180/180 weird-shapes, 92/92 all-sizes synthetic matrix. Pushed as f9c39331. Matching tightening for unsloth-zoo PR #696 (2 lines instead of 3).

…ntext Two issues surfaced by a fresh adversarial review of the validator: 1. v.strip().lstrip("+-").isdigit() let "++512" / "--256" / "+-+512" slip past the gate, then int("++512") raised an uncaught ValueError and Pydantic surfaced "invalid literal for int() with base 10: '++512'" instead of the contracted "vision_image_size must be an integer or null". 2. str.isdigit() returns True for Unicode digit families (full-width '５１２', Arabic-Indic '٥١٢', Devanagari '१०२४'), and int() coerces them, so the value reaching the backend wasn't the ASCII the user typed. Replaced the lstrip+isdigit pair with re.fullmatch(r'[+-]?[0-9]+', stripped), which rejects both shapes with the precise error and accepts the documented ones ('256', '+512', ' 1024 '). Added 8 regression test cases covering multi-sign strings, lone sign, and the three Unicode digit families. Also restored comment context lost in f9c3933: - model-defaults.ts: name studio/backend/models/training.py:_check_vision_image_size as the spec the [256, 2048] range mirrors, so a maintainer changing the cap in one file can find the other. - training-section.tsx: enumerate the three windows in which isDatasetImage is null (before a check, after dataset edits, on import) so a future maintainer doesn't simplify the gate to `isCheckingDataset`. - worker.py: qualify the writable-ndarray comment with "when a resize is requested" so it doesn't misadvertise the resize=None early-return.

chatgpt-codex-connector · 2026-05-26T00:49:35Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-26T00:49:45Z

Spawned two more fresh Opus subagents to re-audit this PR from scratch. Both treated it as a from-scratch read; one was instructed to be adversarial, the other to design new empirical stress tests.

Adversarial reviewer found a real validator bug I missed; pushed the fix as `ab06fa1b` (rebased onto upstream).

Bug found

`_check_vision_image_size` in `studio/backend/models/training.py` used `v.strip().lstrip("+-").isdigit()` to gate `int(v)`. `lstrip("+-")` strips arbitrarily many sign chars, so `"++512"` / `"--256"` / `"+-+512"` passed the gate, then `int("++512")` raised an uncaught ValueError and Pydantic surfaced:

`invalid literal for int() with base 10: '++512'`

instead of the contracted `vision_image_size must be an integer or null`. Reachable from any direct API POST. The regression test in this PR (`test_bool_error_says_integer_not_range`) explicitly enforces this contract for booleans; the sign-prefix case bypassed it.

A second related issue: `str.isdigit()` returns True for Unicode digit families. Full-width Japanese `'５１２'`, Arabic-Indic `'٥١٢'`, Devanagari `'१०२४'` all silently coerced to 512 / 1024, so the value reaching the backend wasn't the ASCII the user typed.

Fix

Replaced the `lstrip` + `isdigit` pair with `re.fullmatch(r'[+-]?[0-9]+', stripped)`, which rejects both shapes with the precise contracted error and accepts the documented ones (`'256'`, `'+512'`, `' 1024 '`). Added 8 regression test cases covering multi-sign strings (`'++512'`, `'--256'`, `'+-+512'`, `'+'`, `'-'`) and the three Unicode digit families.

Also restored 3 over-tightened comments

The adversarial reviewer flagged 3 cases where `f9c39331`'s comment shortening lost load-bearing context:

`model-defaults.ts`: now names `studio/backend/models/training.py:_check_vision_image_size` as the spec the `[256, 2048]` range mirrors, so a maintainer changing the cap in one file can find the other.
`training-section.tsx`: now enumerates the three windows in which `isDatasetImage` is null (before a check, after dataset edits, on import). A future maintainer could otherwise reasonably simplify the gate to `isCheckingDataset` and silently drop the user's choice in the other two windows.
`worker.py`: now qualifies the writable-ndarray comment with "when a resize is requested" so it doesn't misadvertise the `resize=None` early-return.

Empirical stress tester verdict: APPROVE

162 fresh probes (8 zustand-migration, 22 comment-regression, 27 degenerate-images across 11 PIL modes including animated GIF, 31 REST endpoint round-trips through FastAPI TestClient with monkey-patched auth, 35 YAML-hostile, 58 adversarial validator inputs including JSON-pointer/Decimal/Fraction/FakeInt). All real bugs 0; same Unicode-digit behaviour acknowledged as a "documentation-grade behavioural note" (now fixed).

Final aggregate on head `ab06fa1b` (with unsloth-zoo PR #696 applied)

backend pytest: 35/35 (was 27/27, +8 new regression tests for the sign-prefix and Unicode-digit cases)
All JS sims OK
180/180 weird-shapes
92/92 all-sizes synthetic matrix
58/58 real-VLM-processor matrix (Qwen3-VL + Gemma-3-4b)
137/137 Pydantic+MLX matrix
All cross-browser Playwright still green

PR continues to be safe to merge once unsloth-zoo#696 lands.

chatgpt-codex-connector · 2026-05-26T17:55:41Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T03:28:16Z

On the "only MLX is changed" question: the PR actually changes both training paths. The Torch change is small because UnslothVisionDataCollator already supports resize upstream, so trainer.py just threads resize=N, resize_dimension="max" into the existing constructor (lines 3130-3146). The MLX change is larger because mlx-vlm has no equivalent kwarg, so the PR adds its own _mlx_vlm_max_resized_size + _resize_mlx_vlm_image helpers in worker.py plus the DeepSeek-OCR guard. Both paths end up max-side, aspect-preserving, no-upscale, with DeepSeek-OCR excluded; the visible diff just looks MLX-heavy because Torch was already plumbed.

End-to-end evidence on PR head ab06fa1b is up at huggingface.co/datasets/danielhanchen/pr5743-image-sizes:

15 Playwright screenshots of the live Studio UI driving the new control: Train page landing → Qwen3-VL selected → LaTeX_OCR dataset set → Memory tab → Image Size dropdown opened → 256 / 512 / 1024 / 2048 / back-to-Default selected in turn → switched to text-only Llama-3.2-3B (Image Size row gone) → switched to DeepSeek-OCR-2 (Image Size row also gone) → switched back to Qwen3-VL (row returns).
One full-flow video (pr5743_image_size_flow.webm + .mp4 transcode).
Real 20-step QLoRA training on a real B200 GPU of unsloth/gemma-3-4b-it-unsloth-bnb-4bit on 32 rows of unsloth/LaTeX_OCR, with the same seed/LR/batch_size across four presets:

preset	first_loss	last_loss	losses_hash
Default	5.1091	0.5245	`616447c04b89`
256	5.1091	0.6781	`bd382d513114`
512	5.1091	0.5245	`616447c04b89`
1024	5.1091	0.5245	`616447c04b89`

Same first_loss=5.1091 across all four runs (same seed). Default == 512 == 1024 produce bit-identical losses (hash 616447c04b89 for all three) because LaTeX_OCR's images all fit under 512px on the max side, so the resize is a no-op for those three presets. Only 256 diverges (0.6781) because the cap actually downscales the equations. This is the empirical proof that vision_image_size=None (Default) is byte-identical to pre-PR behavior, and that the cap is only active when an image actually exceeds it.

Drivers committed locally as tests/pr5743_studio_ui_flow.py (Playwright at /studio route) and tests/pr5743_train_matrix.py (real LoRA training matrix).

chatgpt-codex-connector · 2026-05-27T10:56:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T12:01:34Z

Thanks so much for the PR - this is really good!

Dariton4000 and others added 3 commits May 23, 2026 20:48

Merge branch 'unslothai:main' into main

1a09e1b

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b932c1

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

Comment thread studio/backend/core/training/worker.py Outdated

Dariton4000 marked this pull request as ready for review May 23, 2026 19:36

Dariton4000 requested review from danielhanchen and rolandtannous as code owners May 23, 2026 19:36

Dariton4000 and others added 4 commits May 24, 2026 13:26

Merge branch 'main' into main

24f42b2

Merge branch 'main' into main

62c221a

[pre-commit.ci] auto fixes from pre-commit.com hooks

64d7e78

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

danielhanchen added 2 commits May 24, 2026 13:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

de13396

for more information, see https://pre-commit.ci

Merge branch 'main' into main

38e2908

[pre-commit.ci] auto fixes from pre-commit.com hooks

eea71c9

for more information, see https://pre-commit.ci

Merge branch 'main' into main

fe770f2

danielhanchen mentioned this pull request May 25, 2026

vision_utils: clamp resize new_w/new_h to >=1 unslothai/unsloth-zoo#696

Merged

4 tasks

Studio: tighten code comments for vision_image_size path

f9c3933

Dariton4000 and others added 2 commits May 25, 2026 15:37

Merge branch 'main' into main

8dcbd33

Merge branch 'main' into main

ae20acf

danielhanchen mentioned this pull request May 27, 2026

tests: unblock three stale assertions broken on main (MLX CI + Backend CI) #5803

Merged

3 tasks

Merge branch 'main' into main

ceda48e

danielhanchen merged commit dac2aed into unslothai:main May 27, 2026
30 checks passed

Uh oh!

Conversation

Dariton4000 commented May 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

danielhanchen commented May 25, 2026

Static + security audit (Opus, 17 files, no tool fast path)

Runtime + integration audit (Opus, real GPU + real models + 3,200+ fresh assertions)

Updated aggregate green count on head 14caf36a (with unsloth-zoo PR #696 applied)

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

danielhanchen commented May 26, 2026

Bug found

Fix

Also restored 3 over-tightened comments

Empirical stress tester verdict: APPROVE

Final aggregate on head `ab06fa1b` (with unsloth-zoo PR #696 applied)

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

Reviewers

Assignees

danielhanchen commented May 24, 2026 •

edited

Loading

Updated aggregate green count on head `14caf36a` (with unsloth-zoo PR #696 applied)