MLX Training updates by mmathew23 · Pull Request #5656 · unslothai/unsloth

mmathew23 · 2026-05-20T22:00:46Z

Expose max grad values and set default random seeds in studio.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces support for elementwise gradient value clipping (max_grad_value) in MLX training and refines random state handling by separating model and LoRA initialization seeds. The changes span the backend training logic, API models, and frontend types. Feedback was provided regarding a discrepancy between the max_grad_value field description and its actual default implementation in the worker, which should be aligned to avoid confusion.

gemini-code-assist · 2026-05-20T22:02:19Z

+    max_grad_value: Optional[float] = Field(
+        None,
+        ge = 0,
+        description = (
+            "MLX-only elementwise gradient value clipping threshold. "
+            "If unset, MLX uses its runtime default."
+        ),
+    )


The description for max_grad_value states that MLX uses its runtime default if unset. However, the implementation in worker.py (line 1396) explicitly defaults it to 1.0 if it is None. To avoid confusion and ensure the API documentation matches the implementation, the description should be updated to reflect that it defaults to 1.0 in this environment.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b79ba4235

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T22:04:13Z

+    model_random_state = config.get("model_random_state", 3407)
+    lora_random_state = config.get("lora_random_state", 3407)


Fall back MLX init seeds to random_seed

_run_mlx_training now reads model_random_state/lora_random_state directly, defaulting both to 3407, but Studio never sets those keys (the request/config path still only forwards random_seed in routes/training.py and core/training/training.py). As a result, a user-supplied random_seed no longer controls model load or LoRA initialization RNG, so seed-based reproducibility is broken for any non-3407 seed (including the API default path where random_seed is 42).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-20T22:04:13Z

+    max_grad_value = config.get("max_grad_value")
+    max_grad_value = 1.0 if max_grad_value is None else float(max_grad_value)


Honor null max_grad_value as MLX runtime default

The new request model documents that leaving max_grad_value unset should defer to MLX defaults, but this code forces None to 1.0 before building MLXTrainingConfig. Because the frontend now sends max_grad_value: null by default, callers cannot actually use the trainer/runtime default and the API behavior contradicts its contract.

Useful? React with 👍 / 👎.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e82926845d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T04:09:28Z

+    model_random_state = config.get("model_random_state", 3407)
+    lora_random_state = config.get("lora_random_state", 3407)


Use request random_seed for MLX model/LoRA init

_run_mlx_training now reads model_random_state/lora_random_state, but the request path still only forwards random_seed (routes/training.py and core/training/training.py). In practice, user-provided seeds no longer control from_pretrained(... random_state=...) or get_peft_model(... random_state=...), so runs with non-3407 seeds are not reproducible on MLX even though seed is still set for the trainer loop.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T04:09:28Z

+    max_grad_value = config.get("max_grad_value")
+    max_grad_value = 1.0 if max_grad_value is None else float(max_grad_value)


Preserve null max_grad_value instead of forcing 1.0

The new API contract says max_grad_value can be unset to let MLX use its runtime default (TrainingStartRequest.max_grad_value description), but this code rewrites None to 1.0 before building MLXTrainingConfig. That makes null behaviorally different from the documented contract and prevents callers from actually opting into the trainer/runtime default.

Useful? React with 👍 / 👎.

Datta0

LGTM

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfb4203400

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

danielhanchen

Looks good, Matthew. Static review (Studio backend MLX worker is Apple Silicon only, so this is review only on the CUDA host I have).

End-to-end trace of max_grad_value:

studio/backend/models/training.py:267-281 accepts Optional[float] (None preserved by Pydantic).
studio/backend/routes/training.py:218 forwards into the worker dict unchanged.
studio/backend/core/training/training.py:218-225 forwards into the config dict unchanged.
studio/backend/core/training/worker.py:1389-1397 reads it, leaves it None, only coerces to float when a numeric value is present.
studio/frontend/src/features/training/api/mappers.ts:84 sends max_grad_value: null by default.

No x or 1.0 fallback downstream, so the API contract null in -> null out is preserved end-to-end. The weight_decay = 0.001 if weight_decay is None else float(weight_decay) normalization at worker.py:1395-1396 is also cleaner than the previous float(config.get(..., 0.001) or 0.001) (which had the well-known "user explicitly passes 0.0 -> coerced to 0.001" trap).

model_random_state / lora_random_state defaulting to random_seed when absent (worker.py:1156-1170) reads correctly. test_training_backend_forwards_random_seed_without_internal_mlx_seed_keys asserts the absent-key path; one thing the test suite does not assert is the present-but-None case (config["model_random_state"] = None would override the seed with None, because config.get("model_random_state", random_seed) only falls back when the key is missing, not when it's present-and-None). Probably not reachable from Studio today since the request schema doesn't expose those keys, but if anything downstream ever does, the semantics may surprise. Easy fix: config.get("model_random_state") or random_seed if you want explicit-null to mean "inherit", or document the present-vs-absent distinction.

tests/studio/run_real_mlx_smoke.py 30-step refresh matches the gate from #5537. Dropping the eos_id append in _compute_loss_and_grad_norm so the smoke loss probe matches Studio's text dataset path is the right move; the prior 7-step assertion was stale per #5622. I can't actually run the smoke from here (CUDA-only host), so this rides on macOS CI evidence.

This PR pairs with unsloth-zoo#684 - cast_norm_output_to_input_dtype and max_grad_value=None semantics only do useful work once MLXTrainingArguments accepts them. Worth a merge-order note in case #684 lands later.

Approving subject to MLX CI green on the smoke test refresh.

test_training_raw_support.py transitively imports the full studio backend (core.training.training -> matplotlib, etc.). Adding every transitive dep to the Windows install smoke is whack-a-mole and defeats the smoke's purpose. test_mlx_training_worker_config.py already covers PR unslothai#5656's wiring (model_random_state / lora_random_state fallback, max_grad_value None preservation, dataset_order=torch_randperm) via source-text assertions on worker.py. The test stubs out structlog/loggers/utils itself, so it works with just stdlib. Drop the broader test from the Windows job.

studio/backend/core/training/worker.py `config.get("model_random_state", random_seed)` only fills the default when the key is absent. When a caller passes `config["model_random_state"] = None` explicitly (which happens any time a JSON payload sends an explicit `null`), the old code forwarded `None` to FastMLXModel and disabled deterministic init silently. Same for `lora_random_state`. Treat absent and explicit None the same way: fall back to random_seed. studio/backend/tests/test_training_raw_support.py Update the source-string assertions to match the new lines.

danielhanchen · 2026-05-24T13:57:08Z

Pushed one small follow-up on top of a404dfd3 (now bff5b443):

studio/backend/core/training/worker.py:1156-1170 was using config.get("model_random_state", random_seed) which only falls back when the key is absent. If a caller serializes {"model_random_state": null} (which Pydantic / JSON happily do for Optional fields), dict.get returns None instead of random_seed and that None reaches FastMLXModel.from_pretrained(random_state=None) and get_peft_model(random_state=None), silently disabling deterministic init. Same for lora_random_state. Reworked to explicit None-check so absent and explicit-null behave identically.

studio/backend/tests/test_training_raw_support.py::test_mlx_worker_falls_back_init_seeds_to_random_seed updated to match the new lines.

The Pydantic schema does not expose model_random_state / lora_random_state today so this is theoretical for the Studio HTTP path, but any non-Studio caller (CLI tests, future REST shape, downstream forks) that did set the keys to null would otherwise get non-reproducible runs. The CUDA workspace I have here cannot run the MLX smoke, but the change is structural and the source-string test pins the new shape.

The PR unslothai#684 and PR unslothai#5656 heads were just updated with maintainer fixes (restored compiler.py UNSLOTH_RETURN_LOGITS elif, GPT-2 ln_* matching, Qwen3-VL flag wiring, default-branch reseed; plus seed present-but-None fix). Bump the three workflow files (comment-only) so the paths filters re-fire and we get a fresh signal on all three runners against the updated PR heads.

Round 2 of reviewer-driven fixes landed on the PR heads: zoo PR unslothai#684: 0753b115 - merged origin/main (restores unslothai#690 / unslothai#691 gpt-oss eager attn) - cleaned up norm cast monkey patch in train() finally - raise on streaming+dataset_order text combo - VLM baseline CE full-sequence forward parity with CCE - scheduler test now matches HF linear-no-warmup behavior unsloth PR unslothai#5656: bff5b44 (unchanged since last run) Re-fire all three workflows so we get a fresh signal.

… PR unslothai#5656 The MLX worker now passes `cast_norm_output_to_input_dtype` and `dataset_order` only when the linked unsloth-zoo dataclass actually declares them. Released zoo trees that predate the paired PR can still construct `MLXTrainingConfig` without raising `TypeError: unexpected keyword argument`. Once the dependency floor is bumped to a release that contains both fields, the feature-detect guards become no-ops. `random_seed = config.get("random_seed", 3407)` was unguarded against explicit `None` from raw / backend callers. The same value seeded the trainer and was the fallback target for `model_random_state` / `lora_random_state`. Normalize once at the top of the function and use the normalized value everywhere so an explicit `None` cannot reach FastMLXModel / get_peft_model / MLXTrainingConfig. Existing seed source-pattern test updated to match the new normalize helper. New test asserts the feature-detection guards exist and that the unconditional kwargs do not include the gated fields.

chatgpt-codex-connector · 2026-05-24T15:23:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T15:24:01Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-24T15:24:07Z

Pushed 56e32b75 on top of bff5b443. Addresses the two P1 items that the round 2 review consensus flagged.

studio/backend/core/training/worker.py:1411 MLXTrainingConfig kwargs are no longer all-or-nothing. The previous version always passed cast_norm_output_to_input_dtype and dataset_order, which would raise TypeError: unexpected keyword argument against any released unsloth-zoo that predated the paired PR RuntimeError: User specified an unsupported autocast device_type 'meta' #684 change. Switched to building the kwargs dict, then gating the two new fields with getattr(MLXTrainingConfig, "__dataclass_fields__", {}). Released zoo trees that lack the fields keep working; zoo trees that have them get the same behavior as before. Once the dependency floor is bumped to a release containing both fields the guards become no-ops, no behavior change.
studio/backend/core/training/worker.py:1159 random_seed is now normalized once at the top of the function. The previous code used config.get("random_seed", 3407) which only inserts the default for absent keys; an explicit None from raw / backend callers passed straight through to FastMLXModel, get_peft_model, and MLXTrainingConfig(seed=None). After this PR's earlier round 1 fix for model_random_state / lora_random_state, the random_seed source itself was the last leg that still leaked None. Now _raw_seed = config.get("random_seed", 3407) is followed by random_seed = 3407 if _raw_seed is None else int(_raw_seed), and the model / LoRA seed overrides also int() their value when not None. Trainer seed = random_seed reads the normalized value directly.

Tests:

PYTHONPATH=. python -m pytest studio/backend/tests/test_training_raw_support.py studio/backend/tests/test_mlx_training_worker_config.py -q
14 passed

Added one new assertion to test_mlx_worker_falls_back_init_seeds_to_random_seed for the seed normalize helper and one new test test_mlx_worker_feature_detects_optional_mlx_config_fields covering the dataclass field guard.

Not addressing in this PR:

Frontend Studio UI control for max_grad_value and cast_norm_output_to_input_dtype (one reviewer P2; the request mapper currently hardcodes max_grad_value: null and has no cast_norm_output_to_input_dtype field). Backend now accepts both via the new schema, and external callers can supply them. Wiring a Studio UI control is a separate frontend task; the API is exposed and validated, the runtime path on the worker side is now portable, so the runtime regression set is closed. Happy to do the UI control in a follow-up.

Yell if anything looks off.

…othai#5656 Round-3 review consensus: the per-field guards that landed in the MLX worker only protect the MLX path. The same `TrainingBackend.start_training` config still reaches the CUDA/text trainer at `worker.py:2267`, the embedding LoRA init at `worker.py:2450`, and embedding TrainingArguments at `worker.py:2624` with raw `None` values, so an explicit `random_seed=None` from a raw / backend caller still breaks non-MLX training even after the previous fix. Move the normalization into `TrainingBackend.start_training` itself, where it runs once for every training mode: - `_coerce_seed(value)`: explicit `None`, non-int, or absent all become 3407. Every downstream worker now sees an int. - `_coerce_optional_bool(value, default)`: explicit `None` falls back to `default` instead of `bool(None) == False`. Also normalizes the common raw-config / YAML string aliases ("true" / "false" / "0" / "1"). Used for `cast_norm_output_to_input_dtype`. - `_coerce_optional_nonneg_float(name, value)`: rejects negative numerics from raw / backend callers, matching the Pydantic `ge=0` constraint the HTTP route already enforces. Used for `max_grad_value`. worker.py MLX path: the existing `bool(config.get(key, True))` for `cast_norm_output_to_input_dtype` was changed to also fall back on explicit `None`, so direct worker callers (bypassing `TrainingBackend.start_training`) are equally safe. `max_grad_value` also raises on negative values inside the worker for the same reason. TrainingStartRequest.random_seed default bumped from 42 to 3407 so direct REST callers that omit the field receive the same default as the Studio frontend and the MLX worker. New regression test exercises the three new helpers across explicit None, valid values, string aliases, and negative-value rejection.

chatgpt-codex-connector · 2026-05-24T15:49:19Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…84 / PR unslothai#5656 1a02643)

danielhanchen · 2026-05-24T15:50:33Z

Pushed 1a026435 on top of 56e32b75. Addresses the consensus P1 / P2 items from a fresh 12-reviewer pass after round 2 landed.

studio/backend/core/training/training.py:225 (asymmetric seed normalization). The round-2 fix added the random_seed=None fallback only inside the MLX worker, but TrainingBackend.start_training still stored None in the shared config. That config is also consumed by the non-MLX trainer (worker.py:2267), embedding LoRA init (worker.py:2450), and embedding TrainingArguments (worker.py:2624). A raw / backend caller passing random_seed=None for any non-MLX run would still hit transformers.set_seed(None) -> TypeError. Moved the normalize into a small _coerce_seed helper that runs ONCE in start_training for every training mode.
studio/backend/core/training/training.py:222 (cast_norm_output_to_input_dtype=None becomes False). Round-2 added the field via kwargs.get(key, True) and the worker did bool(config.get(key, True)). Both have the same explicit-None blind spot: a raw / backend caller passing None rebinds the field to False and silently disables the MLX norm-output cast even though the documented and schema default is True. Added _coerce_optional_bool(value, default) that maps explicit None and common string aliases ("true" / "false" / "0" / "1" / "yes" / "no") through to the boolean default, and applied it at both the backend boundary and the worker for direct callers.
studio/backend/core/training/training.py:221 (no validation on max_grad_value raw path). The Pydantic route model already rejects negative max_grad_value with ge=0, but TrainingBackend.start_training(**kwargs) accepts arbitrary kwargs without validation, so a raw / backend caller passing max_grad_value=-1 reached the MLX trainer as -1.0. unsloth-zoo treats non-positive elementwise clip as "off", silently disabling the new public knob. Added _coerce_optional_nonneg_float(name, value) which preserves None, coerces numerics, and raises ValueError on negatives. Worker mirrors the check for direct callers.
studio/backend/models/training.py:285 (REST schema default was random_seed=42). The Studio frontend, backend default, and worker fallback are all 3407; only the REST schema still defaulted to 42, so HTTP clients that omitted random_seed got a different seed than every other Studio entry point. Bumped to 3407 to match.
New test test_training_backend_normalizes_explicit_none_seed_and_dtypes exercises the three helpers across explicit None, valid values, string aliases, and negative-value rejection. Updated test_mlx_worker_falls_back_init_seeds_to_random_seed and test_mlx_worker_preserves_null_max_grad_value_for_trainer_default to match the new worker source.

Background context (already in earlier rounds):

Feature-detection on MLXTrainingConfig.__dataclass_fields__ so the worker still constructs MLXTrainingConfig against released unsloth-zoo (which predates cast_norm_output_to_input_dtype and dataset_order). Once the floor bumps, the guards become no-ops.
All staging CI legs (mlx-compiler-linux, mlx-smoke-macos, install-smoke-windows) ran green against the prior round-2 head; re-triggering against 1a026435 + 23751c84 now.

Not changed in this PR:

Studio frontend wiring for max_grad_value / cast_norm_output_to_input_dtype. The backend schema accepts both, raw / backend callers can supply them, and the worker side is now portable across unsloth-zoo releases. Adding the UI control is a separate frontend task.

Yell if anything looks off.

The block-extraction used , which stops at the first inner closing paren (e.g. ) and would silently miss a future unconditional / added later in the same dict literal. Switched to proper paren-depth tracking so the unconditional block is checked end-to-end.

chatgpt-codex-connector · 2026-05-24T15:51:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-26T17:04:53Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

mmathew23 · 2026-05-26T17:05:44Z

Added commit 65cd01954 to wire the new zoo append_eos option from Studio by training mode.

Rationale / parity check:

unsloth-zoo now defaults MLXTrainingConfig.append_eos=True, which is correct for generic/direct raw text callers because it preserves the old mlx-lm-style EOS behavior.
Studio SFT formatting is different: for Alpaca/chat-template formatted rows, Studio has already rendered the final text example, and CUDA Studio/TRL does not add an extra EOS for the Qwen3 formatted SFT text path we are using for parity.
I verified on the CUDA side with unsloth/Qwen3-0.6B: the processed SFT row remains length 44 and does not end with eos_token_id.
On MLX with zoo default append_eos=True, the same fixture changes the data surface: 2-step smoke reports 88 tokens and first losses [4.8962, 4.8962].
With Studio passing append_eos=False for SFT formatted text, MLX returns to the parity surface: 2-step smoke reports 86 tokens and first losses [4.6446, 4.6446].

The patch is intentionally conditional:

raw_text_mode = training_type == "Continued Pretraining" or format_type == "raw"
mlx_config_kwargs["append_eos"] = bool(raw_text_mode)

So raw/CPT text still lets MLX append EOS, matching the CUDA raw-text path, while formatted SFT text does not get an extra EOS behind Studio's back.

Validation run locally:

pytest -q studio/backend/tests/test_training_raw_support.py
# 11 passed

chatgpt-codex-connector · 2026-05-26T17:21:08Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Rationale / guardrails for the local Studio/vision push: When callers provide explicit VLM LoRA target_modules together with layer filters, FastVisionModel still needs to route the explicit targets through get_peft_regex. Otherwise the layer filters are ignored and adapters can be attached outside the requested language/vision scope. Do not revert this to plain list(target_modules) for explicit module lists. The CUDA/Studio-facing contract is that explicit targets and layer filters compose: target_modules selects module names, while finetune_language_layers / finetune_vision_layers / finetune_attention_modules / finetune_mlp_modules constrain where those targets are allowed. The regression test covers the language-only explicit q_proj case and source-checks that explicit targets are wrapped through get_peft_regex when filters are active.

mmathew23 · 2026-05-27T00:51:31Z

Reviewer / maintainer guardrail for the next Studio/vision push:

The local VLM LoRA targeting fix is intentional and should not be reverted to plain list(target_modules) handling.

When callers provide explicit VLM target_modules together with layer filters, FastVisionModel still needs to route those explicit targets through get_peft_regex. The intended contract is compositional:

target_modules selects module names, and
finetune_language_layers / finetune_vision_layers / finetune_attention_modules / finetune_mlp_modules constrain where those targets are allowed.

Without the regex wrapping, explicit target lists can ignore the language/vision layer filters and attach adapters outside the requested scope. The added test covers the language-only explicit q_proj case and source-checks that explicit targets are wrapped through get_peft_regex when filters are active.

chatgpt-codex-connector · 2026-05-27T00:53:01Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T00:53:10Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T11:01:21Z

Verified the Studio-side wiring for max_grad_leaf_norm (commit d66f4a7) against the underlying unsloth-zoo trainer change. Posting end-to-end evidence so the API contract is clear.

Plumbing path verified

request.max_grad_leaf_norm (Pydantic Optional[float], ge=0)
-> TrainingBackend.start_training (coerced via _coerce_optional_nonneg_float)
-> studio/backend/core/training/worker.py (re-validates non-negative, feature-detects __dataclass_fields__ on MLXTrainingConfig)
-> MLXTrainingConfig(max_grad_leaf_norm=...) on the worker side.

Feature-detect guard keeps backwards compat with older unsloth-zoo releases that predate the field: if the dataclass doesn't expose it, the kwarg is dropped and the worker falls through to the trainer's runtime default. So this PR is safe to land before or after PR unslothai/unsloth-zoo#684.

Tests passing (15)

studio/backend/tests/test_mlx_training_worker_config.py    4 passed
studio/backend/tests/test_training_raw_support.py         11 passed

Including the new test_training_backend_forwards_grad_clipping_controls which pins the kwarg surface against silent regression.

Why the new default matters in Studio

Studio users default to LoRA training on Apple Silicon where memory headroom is tight. The new MLX default (max_grad_leaf_norm=1.0 instead of max_grad_value=1.0):

Preserves each tensor's gradient direction (closer to CUDA max_grad_norm dynamics, the HF reference).
Pays no cross-tree reduction memory cost (verified on macos-14: max_grad_norm is +2.7 MB peak and +9-10% step time vs leaf_norm on gemma-3-270m, scales linearly with trainable params).
Doesn't break existing Studio runs: identical convergence step on a 30-step memorisation fixture (mean abs delta 0.01 loss vs the prior elementwise default).

Detailed parity data in unslothai/unsloth-zoo#684 review comment.

Tested across precedence

Studio request input	Resolved trainer mode
nothing set	`("leaf_norm", 1.0)` (new MLX default)
`max_grad_value=1.5`	`("value", 1.5)` (preserves API meaning)
`max_grad_leaf_norm=2.5`	`("leaf_norm", 2.5)`
`max_grad_norm=1.0`	`("global_norm", 1.0)`
both `max_grad_value=1.0` and `max_grad_leaf_norm=1.0`	`("value", 1.0)` (explicit value wins)

Each resolution path covered by tests/test_mlx_max_grad_value_none.py in the zoo PR (13 tests).

LGTM from my side on the Studio plumbing.

Trim the 11-line comment block to 5 lines and correct the stale claim that MLXTrainingConfig defaults to max_grad_value=1.0. The new default is max_grad_leaf_norm=1.0 (same memory profile as elementwise but direction-preserving). The smoke still pins max_grad_value=1.0 explicitly to keep the 13-seed pass-rate fixture stable.

chatgpt-codex-connector · 2026-05-27T11:16:16Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Merges 116 main commits (gemini provider, oxc validator package-lock, uninstall script relocation, lockfile audit, etc). Two content conflicts resolved: - studio/backend/tests/test_mlx_training_worker_config.py: both branches appended a new test (HEAD's tokenizer dual-purpose check, main's VLM resize math). Both kept side-by-side; both pass. - tests/studio/run_real_mlx_smoke.py: HEAD's stronger len + train_steps assertion kept; main's auto-following comment kept. 16 Studio backend tests pass post-merge.

chatgpt-codex-connector · 2026-05-27T13:35:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T13:35:55Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Datta0 · 2026-06-02T09:30:14Z

                tuple,
                str,
            )
+            if type(target_modules) in (list, tuple) and (


NIT: Should we at least warn instead that both are mentioned and choosing one over the other or smth?

Datta0 · 2026-06-02T09:40:25Z

            "weight_decay": request.weight_decay,
            "max_grad_norm": request.max_grad_norm,
+            "max_grad_value": request.max_grad_value,
+            "cast_norm_output_to_input_dtype": request.cast_norm_output_to_input_dtype,


NIT: There should be max_grad_leaf_norm entry here?

mmathew23 added 2 commits May 19, 2026 03:30

Expose MLX grad value clipping in Studio

73f37c6

update test

e36b55e

mmathew23 requested review from danielhanchen and rolandtannous as code owners May 20, 2026 22:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b79ba4

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

mmathew23 and others added 3 commits May 20, 2026 22:56

dataset ordering + wd

e8c944f

fix mlx smoke step expectations

377fc67

[pre-commit.ci] auto fixes from pre-commit.com hooks

e829268

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Datta0 approved these changes May 21, 2026

View reviewed changes

Datta0 mentioned this pull request May 21, 2026

fix(mlx-ci): align smoke step count assertion #5622

Closed

cast norm activation output back to original input dtype

bfb4203

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Comment thread studio/backend/tests/test_training_raw_support.py

address mlx studio review feedback

a404dfd

danielhanchen reviewed May 24, 2026

View reviewed changes

danielhanchen mentioned this pull request May 24, 2026

Stage: real-runner validation for MLX Matthew PRs (zoo#684 + unsloth#5656) unslothai/unsloth-staging-1#87

Open

3 tasks

[pre-commit.ci] auto fixes from pre-commit.com hooks

29aa91a

for more information, see https://pre-commit.ci

danielhanchen pushed a commit to unslothai/unsloth-staging-1 that referenced this pull request May 24, 2026

Re-trigger staging CI on round-3 final heads (PR unslothai#684 23751c…

d889bb3

…84 / PR unslothai#5656 1a02643)

Daniel Han-Chen and others added 2 commits May 25, 2026 13:35

Shorten verbose comments in MLX Studio backend

962ca28

Handle MLX Studio EOS appending by mode

65cd019

Wire MLX leaf norm clipping through Studio

d66f4a7

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad8bf14

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

ae6c259

for more information, see https://pre-commit.ci

Datta0 reviewed Jun 2, 2026

View reviewed changes

		model_random_state = config.get("model_random_state", 3407)
		lora_random_state = config.get("lora_random_state", 3407)

		max_grad_value = config.get("max_grad_value")
		max_grad_value = 1.0 if max_grad_value is None else float(max_grad_value)

Uh oh!

Conversation

mmathew23 commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Datta0 left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

mmathew23 commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

mmathew23 commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

Plumbing path verified

Tests passing (15)

Why the new default matters in Studio

Tested across precedence

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!