fix(mlx): persist LoRA adapter metadata on save#679
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates _enrich_mlx_adapter_config in unsloth_zoo/mlx/utils.py to capture and store LoRA parameters such as rank, scale, and dropout, ensuring that the adapter topology is correctly recreated upon reloading. Feedback was provided to synchronize top-level configuration fields with the nested lora_parameters dictionary to prevent potential inconsistencies and ensure compatibility across different MLX loaders.
There was a problem hiding this comment.
Pull request overview
This PR improves MLX LoRA adapter saving so adapter_config.json reliably contains the metadata needed to recreate LoRA adapters with the same topology and hyperparameters (rank/scale/dropout) on reload, especially when callers don’t provide an explicit adapter config.
Changes:
- Extend
_enrich_mlx_adapter_configto infer and persistlora_parameters(rank,scale,dropout) plus top-level compatibility fields. - Persist
unsloth_mlx_lora_module_pathsto enable re-attaching LoRA layers outside the language tower on reload. - Add dropout inference that handles MLX
nn.Dropoutkeep-probability storage (_p_1) when.pis not available.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
eb74b87 to
f0dd900
Compare
| lora_parameters = dict(adapter_config.get("lora_parameters") or {}) | ||
| inferred_lora_parameters = { | ||
| "rank": lora_rank, | ||
| "scale": lora_scale, | ||
| "dropout": lora_dropout, | ||
| } |
The seven upstream workflows (consolidated-tests-ci, lint-ci, mlx-ci, security-audit, stale, studio-export-fix-ci, wheel-smoke) would fire on every push and PR-event to this throwaway staging branch and burn runner minutes that have nothing to do with validating MLX PRs unslothai#679, unslothai#682, unslothai#692. Keep only the three mlx-pr-* workflows on this branch. They stay in upstream main / origin/main untouched -- this deletion is scoped to the staging branch only.
Reload + metadata follow-ups from review feedback: 1. loader._apply_lora_at_paths() now also recreates LoRASwitchLinear for SwitchLinear / QuantizedSwitchLinear and LoRAEmbedding for Embedding / QuantizedEmbedding. The previous version saved switch rank metadata but the reload helper only wrapped Linear, so MoE adapter weights were silently dropped at load time. Switch/embedding imports are wrapped in try/except for older mlx-lm. 2. utils._get_mlx_dropout_probability() now reads MLX _p_1 keep-prob first and falls back to .p. Compat shims that expose both (.p=0.0 default plus a real _p_1) previously wrote stale dropout=0.0. 3. utils._infer_mlx_lora_rank() returns None instead of falling back to lora_a.shape[-1] when lora_a and lora_b disagree on the rank dimension. The previous fallback wrote the input dim as rank for partially materialized or non-LoRA shapes. 4. utils._enrich_mlx_adapter_config() only infers rank/scale/dropout from modules that appear in the caller-provided unsloth_mlx_lora_module_paths set. Previously an unrelated earlier LoRA in named_modules() would write the wrong language-tower params when the caller selected a vision/projector path. Also distinguishes "caller passed nothing" from "caller passed [] or None" so explicit empty values are preserved. 5. utils._enrich_mlx_adapter_config() now fills num_layers (derived via _get_transformer_layers) and fine_tune_type when callers invoke save_lora_adapters() without a trainer-built config. mlx-lm load_adapters() dereferences config.num_layers and previously crashed with AttributeError on these direct-save artifacts.
The 3D rank-consistency check in _infer_mlx_lora_rank() used lora_b_shape[-2] which is the out_features axis. mlx-lm's LoRASwitchLinear stores lora_b as (num_experts, out_features, rank) so rank lives on lora_b_shape[-1]. Using [-2] caused valid MoE adapters to be rejected as 'contradicted' and rank to be left unrecorded in adapter_config.json.
The 3D rank-consistency check in _infer_mlx_lora_rank() used lora_b_shape[-2] which is the out_features axis. mlx-lm's LoRASwitchLinear stores lora_b as (num_experts, out_dims, rank) so rank lives on lora_b_shape[-1]. Using [-2] caused valid MoE adapters to be rejected as contradicted and rank to be left unrecorded in adapter_config.json.
2e94194 to
8fe738b
Compare
- _enrich_mlx_adapter_config: an explicit empty unsloth_mlx_lora_module_paths list now preserves the caller topology but no longer suppresses live rank/scale/dropout inference (treat empty as no filter, with a fallback module captured before filtering). - MLXTrainer.save_model: skip lora_a-bearing modules whose rank cannot be inferred instead of silently falling back to rank 8 while still copying the bad module scale and dropout. - _apply_lora_at_paths: when from_base does not accept scale/dropout kwargs on older mlx-lm, retry with r=rank alone and restore the saved scale on the wrapped layer (scale is a Python attribute, not loaded by load_weights).
- _enrich_mlx_adapter_config: capture the fallback rank/scale/dropout only from modules that pass the explicit_path_set filter, so an explicit caller-supplied path list whose modules are not inferable no longer borrows metadata from unrelated LoRA layers. Auto-discovery and empty explicit-list paths still benefit from the fallback unchanged. - _apply_lora_at_paths: when from_base() cannot accept the dropout kwarg on older mlx-lm, patch wrapped.dropout._p_1 (or .p on shims) after the fallback so the saved adapter dropout is faithfully restored alongside the scale that was already being patched.
- _infer_mlx_lora_rank: replace getattr(lora_a, 'shape', ()) with explicit None and hasattr guards so a None tensor attribute can no longer raise AttributeError on callers that share this helper outside the existing hasattr-gated sites. - _enrich_mlx_adapter_config: remove fallback_rank/scale/dropout variables and the post-loop rescue. After the explicit-path filter was moved ahead of fallback capture, both fallback_* and lora_rank are assigned from the same filtered module pool, making the rescue unreachable. The scoping decision is intentional: an explicit-filtered save must not borrow metadata from unselected modules.
Bring the consolidated MLX adapter metadata regression tests onto the code branch and keep the in-tree helpers and trainer metadata loop.
|
Auto-review verdict: Approved PR adds rank/scale/dropout/path/num_layers persistence and is_layer-aware rank inference for MLX LoRA adapter save+reload, plus reload-time wrapping for Switch/Embedding LoRA and a compat fallback for older mlx-lm from_base signatures; without it, reloaded adapters silently default to scale=1.0 and stale dropout, changing post-reload logits. Reason: Real metadata-persistence bug fixed, reviewer-flagged trainer asymmetry resolved, all loop 2 findings false-positive/hypothetical, 21 regression tests pass. |
…or PR unslothai#679 The earlier uppercase lora_A / lora_B detection got recorded as a reloadable MLX adapter path, but Unsloth and mlx-lm only ever recreate lowercase lora_a / lora_b wrappers on reload; uppercase weights would load with strict=False and silently disappear. mlx-lm itself uses lowercase exclusively, so dropping the uppercase fallback removes a save-reload asymmetry without losing any real caller. Also tighten three rough edges that came out of the same review pass: - _infer_mlx_lora_rank now rejects a Switch/MoE lora_a (..., rank, in) paired with a bare 2D lora_b, matching mlx-lm's LoRASwitchLinear shape contract instead of returning a plausible-looking rank. - _enrich_mlx_adapter_config normalizes a single-string unsloth_mlx_lora_module_paths value to a one-element list so the loader does not iterate the string character by character. - The trainer save_model metadata loop and test helper drop the matching uppercase pair branch in lockstep.
…othai#679 External adapter_config.json files can store unsloth_mlx_lora_module_paths as a bare string. The save side normalized that case in a prior commit, but the load side still iterated the raw string character-by-character, so no LoRA layer was actually wrapped and load_weights(..., strict=False) silently dropped the trained tensors. Add _normalize_mlx_lora_module_paths() and apply it at every reload entry point. Adjacent reload fixes from the same review pass: - Whenever unsloth_mlx_lora_module_paths is present, still call mlx_lm.tuner.utils.load_adapters() to rebuild the standard language tower. The previous flow took the auxiliary-paths-only branch and let strict=False ignore any language-tower LoRA tensors. - _apply_mlx_lora_initialization() now unwraps mlx-lm nn.Linear-wrapped lora_a/lora_b via .weight.shape, matching the unwrap that _infer_mlx_lora_rank() already does. get_peft_model(init_lora_weights= False) and "gaussian" no longer crash on the layer-wrapped form. - _infer_mlx_lora_rank() requires both halves to be at least 2-D, so a bare 1-D lora_b can no longer leak a plausible-but-wrong rank into adapter_config.json.
…ilures MLXTrainer.save_model previously initialized rank/scale/dropout to 8/1.0/0.0 and num_layers to -1 as fallbacks. When _infer_mlx_lora_rank failed for every module or _get_transformer_layers returned no layers, those placeholders silently landed in adapter_config.json. On reload the adapter was rebuilt at the wrong rank/scale and mlx-lm's load_adapters sliced range(-1) to wrap zero LoRA layers, so trained weights had no effect with no warning. Initialize the trainer locals to None and only include num_layers / rank / scale / dropout / lora_parameters in adapter_config when the source value is valid. Coerce per-expert mx.array scales from LoRASwitchLinear via .item() with a safe fallback so the trainer cannot crash building the config. In loader.py, replace the bare except: pass on _apply_lora_at_paths with warnings.warn so partial wrapping failures surface. After the load_weights(strict=False) fallback, diff the saved safetensors keys against the live module tree and warn on any tensor that cannot bind, so users see when trained weights are silently dropped. Update test_trainer_metadata_loop_no_lora_uses_defaults to track the new contract (None instead of 8/1.0/0.0) and add coverage for: rank-inference failure returning None, mx.array scale coercion (0-D succeeds, wider arrays fall back to 1.0), and the trainer's adapter_config dict gating both the num_layers sentinel and the placeholder rank/scale/dropout trio.
…ta fix) into staging/mlx-prs-679-682-692-final Resolved trainer.py conflict by combining PR unslothai#679's None-default initialization (no rank=8/scale=1.0/dropout=0.0 placeholders, no num_layers=-1 sentinel) with PR unslothai#692's iter_mlx_lora_modules iteration and external-non-LoRA trainable routing into save_trainable_adapters vs save_lora_adapters.
… for PR unslothai#679 _enrich_mlx_adapter_config previously did a raw float(getattr(module, "scale", 1.0)). When a LoRA module exposes scale as a multi-element mx.array (LoRASwitchLinear per-expert scale), float() raises and the outer try/except: pass swallowed the entire enrichment block, including unsloth_mlx_lora_module_paths. On reload, the load-side _apply_lora_at_paths short-circuits without those paths, so vision/projector/MoE LoRA tensors are silently dropped via load_weights(strict=False) without firing the new missing-tensor warning. Mirror the trainer-side .item()/ fallback-1.0 coercion in enrich so the path list survives. Narrow the new safetensors-vs-live-tree diff in loader.py to keys ending in .lora_a / .lora_b / .lora_A / .lora_B / .lora_embedding_a / .lora_embedding_b. Before, comparing all keys could fire false-positive "missing" warnings for VLM adapters where mlx-lm rebinds tensors under a different namespace prefix (model.layers... vs language_model.model.layers...). The warning's purpose is to catch silently-dropped LoRA training weights, so filtering to LoRA-suffix keys keeps the signal and drops the noise. Add tests: enrich-side mx.array scale coverage with both 0-D (item() returns) and wider arrays (fallback to 1.0), asserting unsloth_mlx_lora_module_paths survives in both cases.
…thai#679 _normalize_mlx_lora_module_paths now accepts dict-grouped (e.g. {"language": [...], "vision": [...]}) and pathlib.Path-typed entries instead of silently returning []. Older or hand-authored adapter_config.json layouts no longer drop their module paths. Narrow the safetensors-vs-live-tree diff except block to (ImportError, OSError) so unrelated bugs in the diff code (e.g. accidental NameError after a refactor) still surface. When the diff itself is skipped, emit a warning so the user knows the silent-drop diagnostic was bypassed; the load_weights call still runs afterwards either way. Add tests covering dict + pathlib normalization.
…e for PR unslothai#679 The prior commit narrowed the safetensors-vs-live-tree diff except to (ImportError, OSError) but left model.load_weights(adapter_weights_file, strict=False) OUTSIDE the except. Any other exception from safe_open or tree_flatten (notably SafetensorError on a header issue, or ValueError / AttributeError on unusual model objects) now escapes the narrowed except and crashes the entire from_pretrained, even though the actual load was never attempted. Pre-narrow behavior would have still reached the load. Widen the diff except back to bare Exception so the diagnostic can never block the actual load. The skip-warning still fires so a real bug in the diff code is visible. The narrowing intent (catch unrelated bugs) is preserved by the warning message. _normalize_mlx_lora_module_paths now also handles a bare os.PathLike passed at top level (not wrapped in a list). The dict / list / pathlib- in-list arms were added previously; this rounds out the input shapes. _enrich_mlx_adapter_config's outer try/except: pass was masking legitimate caller-input bugs (e.g. lora_parameters="garbage"). Narrow to (TypeError, ValueError, AttributeError) and warn so partial-metadata writes surface instead of silently producing an under-specified adapter_config.json.
1) Save-side normalize asymmetry: _enrich_mlx_adapter_config now reuses the
loader-side _normalize_mlx_lora_module_paths so save and load accept the
same shapes (str / list / tuple / set / dict / pathlib.Path). Before,
passing {"vision": ["vision.proj"]} or [Path("vision.proj")] erased
topology before it reached adapter_config.json.
2) Reload nested-wrap / always-fallback: Reorder so mlx-lm's load_adapters
runs FIRST and rebuilds the language tower via its canonical walk; then
_apply_lora_at_paths re-attaches only the auxiliary paths (vision /
projector / MoE / embedding) that mlx-lm does not handle. Added a
skip-if-already-wrapped guard inside _apply_lora_at_paths so language
paths land as no-ops when load_adapters has already wrapped them.
Previously the prewrap forced load_adapters to either fail or produce
nested LoRALinear(LoRALinear(Linear)).
3) Silent base-model fallback: when load_adapters raises, the fallback
no longer leaves load_weights unrun nor returns the base model;
the original load_adapters exception is preserved and re-raised if
adapters.safetensors is missing or there is nothing to bind against.
4) Reload rank=8 default: _apply_lora_at_paths now raises ValueError when
rank metadata is missing from adapter_config, matching the save-side
None-gating contract. No more silent rebuild at rank=8.
5) _apply_mlx_lora_initialization clobbering layers: when lora_a / lora_b
are nn.Linear-wrapped, the previous `module.lora_a = mx.zeros(...)`
replaced the layer with a raw array, destroying the wrapper and its
methods. New _assign_lora_tensor helper writes to .weight when the
slot is layer-backed and falls back to setattr for bare arrays.
Also expand the saved-vs-live-tree diff suffix tuple to include
.lora_a.weight / .lora_b.weight / .lora_A.weight etc., so the missing-key
warning fires for layer-wrapped saves too. Without this it underreported
drops by half.
Add tests: dict / pathlike / bare-pathlike normalization on both save and
load sides; _enrich preserves dict-grouped and pathlib explicit paths.
Tests now call _load_utils() at the top so they pass under selected /
sharded runs that lack the implicit sys.modules side effects from earlier
tests in the file.
1) Full-finetune checkpoints now stamp fine_tune_type="full". When
_enrich_mlx_adapter_config runs on a no-LoRA model, it previously
left fine_tune_type unset. mlx-lm's load_adapters() defaults missing
fine_tune_type to "lora" and then reads num_layers / lora_parameters,
so the saved full-precision tensors failed to reload. The else branch
now setdefault("fine_tune_type", "full") so reload routes correctly.
2) DoRA exports now stamp fine_tune_type="dora". The collector already
includes the q_proj.m magnitude tensor for DoRA classes, but the
adapter_config still said "lora". mlx-lm only recreates DoRA wrappers
when use_dora=(fine_tune_type=="dora"), so the saved m tensor was
silently dropped on reload via strict=False. Detect DoRA modules via
type(module).__name__.startswith("DoRA") and override the fine_tune_type
to "dora" before setdefault picks the lora default.
3) _is_lm_head_trainable now filters base weights inside LoRA-wrapped
modules using the same lora_module_prefixes pattern that
save_trainable_adapters uses. A LoRA-wrapped lm_head with a
reload-leaked lm_head.weight previously made the function return True,
defeating the CCE memory guard and computing a full V x H weight
gradient per chunk. Now reload-leaked base tensors under LoRA modules
are correctly treated as non-trainable.
4) _push_lora_adapters_to_hub now uploads with allow_patterns scoped to
adapter / tokenizer / config / readme files. Without this filter, a
save_directory that already contained stale model-*.safetensors,
model.safetensors.index.json, or *.gguf files from a prior merged save
would silently push them to the LoRA adapter repo. Public-by-default
repos would have leaked merged weights under a "LoRA adapter" repo;
the allow-list rules out catch-alls like "*.safetensors" / "*" so a
future allow_patterns expansion cannot re-introduce the regression.
Also unblock the lora_parameters / top-level rank-scale-dropout backfill:
the top-level keys are now backfilled whenever lora_parameters is present,
not only when lora_parameters was computed by this function. mlx-vlm reads
the top-level keys directly; without this fix a caller-supplied
lora_parameters dict left rank/scale/dropout absent at the top level.
Also drop the num_layers=-1 sentinel here too (mirroring the trainer side
None-gating in PR unslothai#679): if _get_transformer_layers returns nothing,
num_layers is simply omitted so mlx-lm's loader raises a clear
AttributeError instead of slicing range(-1) and applying zero LoRA layers.
Tests: enrich stamps "full" on no-LoRA models, "dora" on DoRA models;
_is_lm_head_trainable returns False when only LoRA-wrapped lm_head plus
reload-leaked weight are trainable; _push_lora_adapters_to_hub passes a
restrictive allow_patterns list that excludes stale full-model artifacts.
…i#679 Three real bugs surfaced by the reviewer.py round-8 sweep against head fc89ba6, plus a follow-on DoRA reload-side hook needed for PR unslothai#692's new fine_tune_type='dora' stamp. (1) Auxiliary LoRA tensors never loaded after load_adapters success. The post-R7c reordering ran load_adapters first (binds the language tower), then _apply_lora_at_paths attached auxiliary LoRA wrappers (vision / projector / MoE / embedding). But the follow-up load_weights call only existed in the load_adapters-failed fallback. When the common case succeeded, the new aux wrappers were left init-only and the saved auxiliary tensors were silently discarded. Add a second load_weights(adapter_weights_file, strict=False) after the success branch when _aux_attached > 0; strict=False is safe because the language tower is already bound and the aux paths now exist. (2) Silent base-model fallback when no wrappers were attached. When load_adapters failed AND _apply_lora_at_paths attached zero wrappers, the strict=False fallback was still calling load_weights on a bare model, which silently dropped every LoRA tensor and returned the base. The caller never knew the adapter failed. Re-raise the original load_adapters exception in that case. (3) Missing-rank validation ran before the already-wrapped skip. Legacy adapters that stored unsloth_mlx_lora_module_paths but no rank/lora_parameters would crash here even when load_adapters had already rebuilt the language tower (so the loop would skip every path). Defer the rank check to the moment we actually need to wrap; loops over already-wrapped paths now return 0 cleanly. (4) DoRA reload support: when adapter_config.json carries fine_tune_type='dora' (the new save-side stamp from PR unslothai#692), _apply_lora_at_paths now selects mlx_lm.tuner.dora.DoRALinear / DoRAEmbedding instead of plain LoRALinear / LoRAEmbedding. Without this, the saved q_proj.m magnitude tensor silently dropped via strict=False and the reloaded model lost DoRA semantics. Also: _apply_lora_at_paths now returns the number of wrappers attached so the caller can decide whether the post-success load_weights is needed.
…t for PR unslothai#679 Round-9 review caught several asymmetric / silent-fallback bugs along the adapter reload + save paths. This commit: * Adds a shared `_warn_missing_adapter_keys` diagnostic that diffs saved vs live LoRA keys (covers raw, layer-backed, embedding, and DoRA `.m` variants), and routes both the success and fallback `load_weights` branches through it so silent drops surface consistently. * Refactors the fallback branch's previously inline diagnostic into a single call to the shared helper, eliminating duplicated suffix lists and ensuring DoRA `.m` tensors are included in the diff. * Adds `_coerce_mlx_lora_scale` so LoRASwitchLinear's per-expert `mx.array` scale no longer silently degrades to `1.0` when `.item()` raises. Uses it in `MLXTrainer.save_model` and `_enrich_mlx_adapter_config` so both save sides agree. * Stops `_enrich_mlx_adapter_config` from overwriting caller-provided global LoRA metadata when the caller pinned `unsloth_mlx_lora_module_paths` to an aux subset; the language tower's global rank/scale/dropout is now preserved when the explicit-path filter narrows inference. * Adds `_patch_mlx_lora_from_base_compat` that monkey-patches mlx-lm's `LoRALinear` / `LoRASwitchLinear` / `LoRAEmbedding` `from_base` so the upstream `linear_to_lora_layers` walk accepts `scale=` / `dropout=` on older mlx-lm wheels (mirrors the aux-path compatibility shim already in `_lora_from_base_compat`). Called before each language-LoRA wrap site; idempotent; skipped when the class has no `from_base` attribute.
… loud on switch/embedding for PR unslothai#679 Round-10 review caught several asymmetric / silent-fallback bugs along the adapter reload + auxiliary wrapping path: * `load_adapters()` is now preceded by `_patch_mlx_lora_from_base_compat()` so older `mlx-lm` wheels that reject `scale=` / `dropout=` kwargs on `from_base()` succeed during reload the same way the training path already does. Previously a clean reload could hard-fail on those wheels and force the fallback even when adapter metadata was correct. * The outer adapter-detection catch now preserves namespaced `RuntimeError("Unsloth MLX: ...")` in addition to `ValueError` and `ImportError`, so the new "adapter load failed and adapters.safetensors is missing" runtime error no longer falls through to a silent base- model load. * `_ensure_metadata()` wraps the `int(raw_rank)` / `float(...)` coercions in a namespaced `Unsloth MLX:` `ValueError` so malformed configs (`"rank": "not-an-int"`) raise the same prefix the outer catch preserves. Previously the plain conversion error was swallowed. * The switch and embedding aux-wrap branches in `_apply_lora_at_paths` now raise `ImportError` when `LoRASwitchLinear` / `LoRAEmbedding` is unavailable, mirroring the DoRA hard-fail rule. Saved switch / embedding LoRA tensors no longer silently drop via `strict=False` on an older mlx-lm. * Legacy direct-save adapters that wrote `unsloth_mlx_lora_module_paths` but did not persist `rank` / `scale` / `dropout` in `adapter_config.json` now have their rank inferred from `adapters.safetensors` tensor shapes via `_infer_rank_from_saved_adapter`. Previously these adapters regressed from "load with placeholder rank=8" to a hard reload failure.
…path metadata + fail-loud partial fallback for PR unslothai#679 Round-11 review caught four remaining gaps in the adapter reload + save metadata paths: * `_patch_mlx_lora_from_base_compat` now also patches `mlx_lm.tuner.dora.{DoRALinear,DoRAEmbedding}`. mlx-lm's load_adapters reaches into DoRA's `from_base(..., scale=..., dropout=...)` for `fine_tune_type="dora"` and older wheels reject those kwargs the same way LoRA does. Setattr is wrapped in try/except so simulation stubs with __slots__ or no __dict__ degrade gracefully. * `_infer_rank_from_saved_adapter` now classifies suffixes by tensor orientation (`rank_first_2d_suffixes`) rather than presence of `.weight`. PEFT-style uppercase `lora_A` (no `.weight`) is saved as `(rank, in_features)` like `lora_a.weight`, not `(in_features, rank)` like the raw lowercase `lora_a`. Previously `(4, 512)` was inferred as rank 512, building an incompatible wrapper that mis-binds the saved rank-4 tensor on strict=False. * `_enrich_mlx_adapter_config`'s explicit-path + caller-metadata elif branch previously skipped backfilling `num_layers` and treated caller-supplied partial metadata (e.g. `{"rank": 4}` only) as complete. Now mirrors the main branch's `num_layers` backfill and fills `scale` / `dropout` from inferred values (or `1.0` / `0.0` defaults), so the LoRA dict mlx-lm.load_adapters reads is always complete. * The manual fallback after `load_adapters()` fails now consults the saved-vs-live key diff returned by `_warn_missing_adapter_keys()` and re-raises a chained `RuntimeError` (preserving the original load_adapters exception via __cause__) when language-tower LoRA keys remain unbound. Previously `strict=False` silently dropped the language LoRA tensors and returned an aux-only adapter that trained as base + auxiliary noise.
…+ log per-path aux skips for PR unslothai#679 Round-12 Opus review caught three remaining silent-data-loss surfaces in the adapter reload path: * The success branch only ran `_warn_missing_adapter_keys` when `_aux_attached > 0`. If a caller declared `unsloth_mlx_lora_module_paths` but every aux path was unreachable (live module renamed, removed, or an unhandled type the wrap loop skipped), `load_adapters()` would succeed for the language tower, `_apply_lora_at_paths` would skip every aux path, and the saved aux tensors stayed in adapters.safetensors with no live module to bind into; the user got a silently incomplete VLM/MoE adapter. Now always diff saved-vs-live keys when the saved config declared aux paths, and raise the same shape of `RuntimeError` the fallback branch already raises when no aux wrapper attached AND saved tensors remain unbound. * `_apply_lora_at_paths` previously `continue`d silently on three classes of per-path failure (`module is None`, `lora_cls is None`, `parent is None / not hasattr(parent, leaf)`). The new `_skipped_paths` list now tracks each skip with a reason (`module_missing`, `unhandled_type:<typename>`, `parent_unreachable`) and emits a single consolidated `warnings.warn` at the end of the wrap loop so the user can tell WHICH live module is missing or unsupported (the safetensors-level diagnostic only reports tensor keys, not module reasons). * `_lora_from_base_compat` and the monkey-patched `_compat_from_base` previously caught EVERY `TypeError` from `from_base(..., scale=..., dropout=...)` and retried with progressively fewer kwargs, ending at the upstream default `r=8`. If the underlying TypeError came from an unrelated source (e.g. a future mlx-lm wrapper validating that `SwitchLinear` cannot be wrapped by `LoRALinear` and raising `TypeError`), the fallback would silently bind a rank=8 wrapper to a saved different-rank tensor that `strict=False` then drops. Introduce `_is_from_base_kwarg_typeerror` heuristic and only retry on signature-mismatch messages; unrelated TypeErrors propagate.
…or caller LoRA metadata for PR unslothai#679 Round-13 Opus review caught three follow-on bugs in the R12 fixes: * Success branch only raised when `_aux_attached == 0` AND saved tensors remained unbound. The "mixed partial" case (3 of 5 declared aux paths attached, 2 skipped) silently called load_weights(strict=False) and returned an adapter with the skipped paths' saved tensors dropped. Move the saved-vs-live diagnostic raise outside the `_aux_attached == 0` guard so it fires for any non-empty `_missing_after_success`, mirroring the fallback branch's semantics. * `_FROM_BASE_KWARG_TYPEERROR_NEEDLES` previously included bare `scale` and `dropout` substrings, which made the heuristic mis-classify any unrelated TypeError that mentions those words (e.g. an mlx-lm internal `TypeError("scale must be a finite float")` or a `Dropout(p=...)` validator) as a signature mismatch. That defeated the R12 narrowing intent: such errors would still silently fall through to `from_base(module, r=r)` and `_apply_lora_metadata_to_wrapper`, reaching the same wrong-rank wrapper R12 was supposed to prevent. Drop the bare substrings; keep only the single-quoted CPython forms and explicit "not accepted" / "not supported" rejection phrasings. * `_enrich_mlx_adapter_config` first branch required `explicit_filter_narrowed AND has_caller_lora_metadata` to skip the overwrite, so a direct `save_lora_adapters(model, path, adapter_config={"rank": 4, "scale": 8.0})` call without `unsloth_mlx_lora_module_paths` silently replaced the caller's rank/scale with whatever the first LoRA module walk happened to infer. Honour caller-supplied metadata symmetrically: drop the explicit-paths requirement from the skip guard so any caller- provided `rank`/`scale`/`dropout` wins over inferred values.
…axis + drop dead local for PR unslothai#679 Round-15 review caught a regression and three follow-on bugs in the recent metadata + reload helpers: * `_enrich_mlx_adapter_config` elif branch's inferred_fallbacks only filled `scale` and `dropout`; if a direct save_lora_adapters caller passed `{"scale": 9.0}` (no rank), the elif gate `if "rank" in lora_parameters` was always False and the saved config dropped LoRA metadata entirely. Add `rank` to inferred_fallbacks (with `None` as the no-default sentinel so the gate still skips when neither caller nor inference produced a rank). * `MLXTrainer.save_model()` omitted `num_layers` when `_get_transformer_layers()` returned None / empty. mlx-lm's load_adapters does `config.num_layers` attribute access on the SimpleNamespace built from adapter_config.json, raising AttributeError when the key is missing. Pre-PR wrote `-1` as the legacy "all layers" sentinel; restore that fallback so reload no longer crashes when layer detection fails. * `_infer_rank_from_saved_adapter` had `.lora_embedding_a.weight` in `_rank_first_2d_suffixes`. mlx-lm's LoRAEmbedding saves the A tensor as `(num_embeddings, rank)`, not `(rank, in_dims)`, so taking `shape[0]` returned `num_embeddings` (e.g. 32000) as the inferred rank. Remove the suffix so it falls through to the default `shape[-1]` branch, returning the actual rank. * The R13 `explicit_filter_narrowed` local in `_enrich_mlx_adapter_config` was unused after the R13 guard simplification. Drop it to clear the Ruff F841 11/12 reviewers flagged.
… positional order + narrow TypeError classifier for PR unslothai#679 Fixes three regressions / asymmetric-fix gaps from round 16 review: 1. unsloth_zoo/mlx/utils.py: _enrich_mlx_adapter_config previously treated caller-supplied rank/scale/dropout as canonical even when the live LoRA modules proved different values. Saving a live rank-4 adapter with a stale `{"rank": 8, "scale": 1.0, "dropout": 0.0}` caller config wrote rank=8 / scale=1.0 to adapter_config.json and produced reload-time shape mismatches. Live module state describes the tensors being saved now, so it must override stale scalar metadata; caller metadata is now only used as fallback when no trustworthy live rank can be inferred. Also tighten num_layers fallback to write -1 (the legacy "all layers" sentinel) instead of omitting the key, matching the trainer save path so both halves keep mlx-lm load_adapters' attribute access on `config.num_layers` working uniformly. 2. unsloth_zoo/mlx/loader.py: _patch_mlx_lora_from_base_compat installed `(cls, module, r=8, scale=20.0, dropout=0.0)` while upstream mlx-lm `LoRALinear.from_base` is `(linear, r=8, dropout=0.0, scale=20.0)`. After the monkey-patch, any external positional caller writing `LoRALinear.from_base(linear, 4, 0.1, 2.0)` silently received `scale=0.1` / `dropout=2.0`. Restore upstream positional order. 3. unsloth_zoo/mlx/loader.py: _is_from_base_kwarg_typeerror used to treat ANY `"not supported"` / `"not accepted"` substring as a from_base signature mismatch, so a wrapper raising `TypeError("rank 4 not supported by this wrapper")` would trigger the fallback path that retries without rank and returns a default rank-8 wrapper. Require the message to also name one of the kwargs we are sending (as either a CPython-quoted name or a standalone word) before treating broad needles as signature mismatches. Use regex word boundaries so single-char kwargs like "r" do not match substrings of "rank" / "wrapper". Updated the retry sites to use the new `kwargs=` API.
…-loud rank fallback for PR unslothai#679 Fixes three asymmetric-fix gaps from round 17 review: 1. unsloth_zoo/mlx/utils.py: the caller-metadata elif branch of `_enrich_mlx_adapter_config` only wrote `num_layers` when the layer count was positive, while the inferred-rank branch and trainer already write `-1` as the legacy "all layers" sentinel when discovery fails. A direct `save_lora_adapters()` caller that supplied valid rank/scale/dropout but no num_layers would then produce a config that mlx-lm.load_adapters() could not read (it does `config.num_layers` attr access on a SimpleNamespace). Mirror the `-1` fallback in the elif branch so both halves agree. 2. unsloth_zoo/mlx/loader.py: `_warn_missing_adapter_keys()` only diffed the saved vs live LoRA key sets. A wrong-rank live wrapper at a matching key name (e.g. mlx-lm.load_adapters wrapped the language tower at the upstream default rank=8 while saved tensors are rank-4) was reported as "fully bound" even though the downstream `load_weights(strict=False)` would silently drop the shape-incompatible tensors. Compare tensor shapes too and surface shape mismatches in the warning preview. 3. unsloth_zoo/mlx/loader.py: both `_lora_from_base_compat()` and the monkey-patched `_compat_from_base()` ended their compat ladder with a bare `from_base(module)` call. On an older shim that rejects `r=` for ANY value, that silently produced a rank-8 wrapper even when the caller requested rank-4, and `load_weights(strict=False)` then dropped the saved rank-4 tensors. Add `_no_rank_fallback_or_fail()` which raises a namespaced ValueError when the requested rank is not the upstream default (8); only rank=8 callers may keep the no-rank fallback.
…aise on success branch + guard caller-fallback on explicit-path saves for PR unslothai#679 Fixes three asymmetric-fix gaps from round 18 review: 1. unsloth_zoo/mlx/utils.py: `_enrich_mlx_adapter_config` recorded `unsloth_mlx_lora_module_paths` but did not persist a `lora_parameters["keys"]` entry. On reload, upstream `mlx_lm.tuner.utils.load_adapters` treats missing `keys` as "scan every Linear / Embedding / Switch layer" and creates extra zero-init LoRA wrappers outside the saved topology. Write the saved paths into `lora_parameters["keys"]` in both the inferred-rank branch and the elif caller-metadata branch so the upstream walker stays constrained to exactly the modules Unsloth recorded. 2. unsloth_zoo/mlx/loader.py: the success branch of the reload path only raised on `_missing_after_success` when an explicit path list was present. A stale language-only adapter (config says rank=8 but `adapters.safetensors` carries rank-4 tensors) was caught by the shape-aware diagnostic but then ignored because no aux paths were declared. Drop the `_saved_lora_paths` guard so the success branch raises on any missing-or-shape-incompatible saved tensor, matching the fallback branch's behaviour. 3. unsloth_zoo/mlx/utils.py: when the caller pinned explicit module paths AND those selected real live LoRA modules that all failed trustworthy rank inference (e.g. malformed half-built wrappers), the elif caller-metadata branch still persisted the caller's stale top-level rank/scale/dropout into `lora_parameters`. Track "selected lora seen but inference failed" via a new `allow_caller_metadata_fallback` flag and skip the caller-write in exactly that case so placeholder metadata can not survive against the actual saved tensor shapes.
…er for PR unslothai#679 Two asymmetric-fix gaps caught by reviewers. 1) lora_parameters[keys] drifts from the authoritative unsloth_mlx_lora_module_paths in two cases. An explicit empty pin (unsloth_mlx_lora_module_paths=[]) skipped the keys write because the gate was `if saved_paths`, so the saved config still let mlx-lm scan every Linear/Embedding/Switch and reinstate ghost wrappers. A stale caller-supplied lora_parameters[keys] also survived the live override that overwrites rank/scale/dropout and the path list, so mlx-lm reloaded wrappers for paths that no longer existed. Extract the sync into _sync_mlx_lora_keys() so both branches mirror the authoritative path list (including the empty-pin case) and drop any stale caller keys when no path list is present. 2) The adapter-detection fallback in FastMLXModel.from_pretrained only re-raised Unsloth-tagged ValueError / ImportError / RuntimeError, so an ordinary AttributeError on a malformed lora_parameters block (e.g. SimpleNamespace built without the field) fell through and silently base-loaded the model with no adapter weights. When the adapter_config.json explicitly declares peft_type=LORA or fine_tune_type in {lora, dora}, refuse to swallow the error and raise a namespaced RuntimeError instead.
…er for PR unslothai#679 Two asymmetric-fix gaps caught by reviewers. 1) lora_parameters[keys] drifts from the authoritative unsloth_mlx_lora_module_paths in two cases. An explicit empty pin (unsloth_mlx_lora_module_paths=[]) skipped the keys write because the gate was `if saved_paths`, so the saved config still let mlx-lm scan every Linear/Embedding/Switch and reinstate ghost wrappers. A stale caller-supplied lora_parameters[keys] also survived the live override that overwrites rank/scale/dropout and the path list, so mlx-lm reloaded wrappers for paths that no longer existed. Extract the sync into _sync_mlx_lora_keys() so both branches mirror the authoritative path list (including the empty-pin case) and drop any stale caller keys when no path list is present. 2) The adapter-detection fallback in FastMLXModel.from_pretrained only re-raised Unsloth-tagged ValueError / ImportError / RuntimeError, so an ordinary AttributeError on a malformed lora_parameters block (e.g. SimpleNamespace built without the field) fell through and silently base-loaded the model with no adapter weights. When the adapter_config.json explicitly declares peft_type=LORA or fine_tune_type in {lora, dora}, refuse to swallow the error and raise a namespaced RuntimeError instead.
Comments-only refactor: drop pure narration and compress multi-line WHY explanations to a single line per intent. No behavior change.
…-export-adapters # Conflicts: # unsloth_zoo/mlx/loader.py # unsloth_zoo/mlx/trainer.py # unsloth_zoo/mlx/utils.py
* fix(mlx): save only adapter tensors * Anchor LoRA filter to lora_a/lora_b modules and fix trainer detection for PR #692 Three follow-ups from review feedback: 1. save_lora_adapters now filters via a new collect_mlx_lora_adapter_tensors() helper that walks named_modules() and keeps only tensors belonging to modules that actually expose lora_a / lora_b. The previous "lora_" substring filter falsely included any path containing the prefix, e.g. router.lora_gate.weight or scalar_lora_alpha. Tolerates lora_A / lora_B casing too. 2. MLXTrainer.save_model() now uses the same helper to decide whether to call save_lora_adapters. The previous trainable_parameters() scan missed adapter tensors after a reload/freeze (LoRA lives in parameters() but is not always marked trainable), which caused final export to fall through to save_merged_model() instead of writing an adapter file. 3. tests/test_mlx_save_lora_adapters_filter.py adds five regressions over the split-save semantics: lora-only filter, brittleness demonstration (lora_router), no-LoRA ValueError, trainable checkpoint preserves everything, post-reload adapter detection. Also widens the ValueError message to point users at save_trainable_adapters() for non-LoRA checkpoint use. adapter_config.json write also now pins encoding="utf-8" for cross-platform parity. * Scrub .github/workflows for staging push (matches staging base) * Split: keep only 1 file(s) * mlx: align LoRA detection across save and norm-freeze paths _ensure_lora_frozen now drives LoRA detection from collect_mlx_lora_adapter_tensors instead of an "lora" substring scan over trainable_parameters(). The norm-freezing intent from blame commit 82d75e0 ("Improve compiled training loop, CCE memory optimizations, and LoRA stability" - "Add _ensure_lora_frozen() to prevent NaN from unfrozen LayerNorm in adaptive optimizers") is preserved verbatim; only the LoRA presence predicate is swapped so that reloaded/frozen LoRA models, whose adapter tensors live in parameters() but are not listed as trainable, still trigger the safeguard. The old predicate silently returned False in that state and let the LayerNorm NaN bug recur, which is the exact failure mode the original commit was added to prevent. save_pretrained_merged now uses the same collect_mlx_lora_adapter_tensors predicate, so the outer LoRA-vs-merged gate and the inner save_lora_adapters helper agree on what counts as a LoRA model and the error path stays consistent. * mlx: scope norm-freeze guard to active LoRA and hoist merged save check _ensure_lora_frozen now requires at least one module-anchored LoRA tensor to be in trainable_parameters() before freezing accidentally trainable norms. The previous full-tree check would silently freeze a user's trainable norm when a reloaded model still carried frozen LoRA tensors in parameters() but the user was running a non-LoRA fine-tune. The module-anchored predicate is kept (no more "lora" substring false positives), only the trainability gate is restored, preserving the blame-cited NaN-protection intent for active LoRA training without disturbing the non-LoRA reload case. save_pretrained_merged no longer collects LoRA adapter tensors when save_method is merged_16bit or merged_4bit, since those branches do not read has_lora. The call is hoisted into the lora branch so merged saves skip the full-parameter walk. * mlx: tighten save_model LoRA detection comment Trim the four-line rationale block to the single load-bearing fact: reloaded LoRA can sit in parameters() without trainable_parameters(), so the old substring check fell through to save_merged_model(). * Cover MLX LoRA detection edges in adapter-filter test module Extend the existing regression module with five behavior-named assertions: - norm-freeze runs when LoRA is actively trained alongside an accidentally trainable norm - norm-freeze skips when adapter tensors are reloaded but not trainable (the user is doing a non-LoRA fine-tune) - norm-freeze skips for non-LoRA models - save_pretrained_merged raises the user-facing "no LoRA layers" message at the gate when adapter tensors are absent - save_pretrained_merged skips the LoRA collector for merged_16bit and merged_4bit save paths Also drop the unused json + pathlib imports, the unused tmp_path fixture in test_collect_lora_helper_finds_adapters_after_reload, and tighten the module docstring. * Preserve non-LoRA trainables and fix uppercase/SwitchLinear paths for PR #692 - collect_mlx_lora_adapter_tensors() now requires a complete attribute pair (lora_a + lora_b, or lora_A + lora_B). Half-adapter modules no longer slip through into adapters.safetensors as unreloadable artifacts. - New iter_mlx_lora_modules() helper feeds the collector, _enrich_mlx_adapter_config(), and MLXTrainer.save_model() so uppercase tensors get matching module paths, rank, scale, and dropout metadata instead of falling back to defaults. - MLXTrainer.save_model() routes mixed LoRA + non-LoRA trainables to save_trainable_adapters() so intentionally trainable embeddings, projector, vision, or norm weights are not silently dropped from the final artifact. Frozen-LoRA + non-LoRA-trainable runs now correctly fall through to save_merged_model(). - LoRASwitchLinear rank reads shape[-2] for (num_experts, rank, in_dims) instead of writing in_dims into adapter_config["rank"]. Test cases: - test_collect_lora_helper_accepts_uppercase_pair - test_collect_lora_helper_drops_half_adapter_module - test_iter_mlx_lora_modules_reports_attr_pair - test_enrich_adapter_config_records_uppercase_lora_paths * Scrub .github/workflows for staging push (matches staging base) * Split: keep only 1 file(s) * mlx: tighten LoRA save routing and guard frozen-trainable checkpoint - trainer.save_model now calls the unsloth_zoo save_lora_adapters utility directly, matching the calling convention used for save_trainable_adapters on the sibling branch and removing the dependence on the loader-side monkeypatch when the trainer is constructed without _patch_mlx_saving. - _ensure_lora_frozen reuses one collect_mlx_lora_adapter_tensors pass and uses the resulting adapter key set for the norm safeguard so paths that merely contain "lora" (e.g. router.lora_gate.weight) no longer disable the freeze. - save_trainable_adapters now raises when trainable_parameters() is empty so checkpoint directories never carry adapter_config.json without an adapters.safetensors next to it. - Drop unused unpack names (_b_attr, _module) in iter_mlx_lora_modules loops. * mlx: route save_pretrained_merged through trainable-aware adapter writer - save_pretrained_merged(save_method='lora') now mirrors MLXTrainer.save_model: when the trainable set contains keys outside the module-anchored adapter set (intentionally trainable embeddings/projector/vision), it routes through save_trainable_adapters so the public save API does not silently lose the trained non-LoRA state. Pure LoRA runs still take the lean save_lora_adapters path. - Periodic in-loop checkpoint save now wraps save_trainable_adapters in try/except so a fully-frozen state at a checkpoint step degrades to a printed skip rather than tearing down the training loop on the empty-trainable ValueError introduced in the previous commit. FA3 note: the original "Saved checkpoint to ..." print line (blame e6d8f7f "fix mlx: Adds the MLX training path used by Studio on Apple Silicon (#634)") is preserved as the try/except else arm; it still runs on every successful checkpoint write and only the failure path is added. * mlx: guard norm-freeze on empty trainable + tighten LoRA save path - _ensure_lora_frozen now early-returns when trainable_parameters() is empty so the helper does not require model.parameters() on the stub models used by tests like test_adam_optimizers_enable_bias_correction. why safe: with zero trainable tensors there is nothing for the norm safeguard to flag and no LoRA detection is required. - save_model uses _extract_mlx_lora_parameters() to derive the rank / scale / dropout fields and computes the trainable split inside the LoRA branch so the writer routing is driven by the same module-anchored adapter set regardless of which tensors happen to be trainable. - save_trainable_adapters now unions the trainable tree with the full module-anchored adapter set so reloaded LoRA tensors that were frozen before a mixed fine-tune are still emitted into the artifact. - _enrich_mlx_adapter_config now backfills lora_parameters / rank / scale / dropout from the model when the caller did not supply them so save_pretrained_merged ships a complete adapter_config.json. FA3 note: the original `from .utils import save_merged_model` and `from .utils import _get_transformer_layers` lines (blame e6d8f7f "fix mlx: Adds the MLX training path used by Studio on Apple Silicon (#634)") are preserved at their original locations in save_model; the new `_extract_mlx_lora_parameters` import is added as a separate statement next to save_merged_model so neither existing import is deleted or consolidated. * mlx: trim verbose adapter-save commentary Drop multi-line restatements of routing intent and the test-name back reference; keep only the load-bearing why for the empty-trainable guard and the structural-detect comment in save_model. No behavior change. * mlx: extend adapter-filter coverage with mixed and frozen-LoRA edges Adds regression coverage for: - save_trainable_adapters raises when nothing trainable AND no LoRA - save_trainable_adapters preserves frozen LoRA alongside trainable norms - _ensure_lora_frozen freezes norms whose path contains literal "lora" - save_pretrained_merged(save_method='lora') routes mixed vs pure cases - save_pretrained_merged(save_method='lora') writes complete adapter_config Guards the autouse _install_shim against double-injecting the torch shim on hosts where real mlx is already importable. * Make adapter export strictly LoRA-only and drop uppercase paths for PR #692 The earlier mixed-trainable routing in save_pretrained_merged() and MLXTrainer.save_model() let base tensors like q_proj.weight leak into adapters.safetensors whenever a reloaded checkpoint had base weights marked trainable. save_method='lora' now always emits a clean LoRA-only artifact that mlx-lm.load_adapters() can read. Callers that intentionally ship mixed LoRA + embedding / projector / vision trainables use the explicit save_trainable_adapters() API; in-loop checkpoints still cover the full trainable tree. Same review pass also exposed three smaller asymmetries; fix all three: - iter_mlx_lora_modules / collect_mlx_lora_adapter_tensors no longer detect uppercase lora_A / lora_B. mlx-lm reload only ever recreates lowercase wrappers, so accepting uppercase made save_lora_adapters emit weights that load_weights(..., strict=False) silently dropped. - _is_lm_head_trainable() now consults collect_mlx_lora_adapter_tensors() instead of an "lora" substring test, so trainable keys whose names merely contain "lora" (lora_router.weight or base.lora_special.lm_head .weight) are no longer misclassified as adapter state. - _enrich_mlx_adapter_config() backfills num_layers + fine_tune_type + peft_type so save_pretrained_merged(save_method='lora') ships a config mlx-lm.load_adapters() can consume, matching the trainer's save path. - _extract_mlx_lora_parameters() reads MLX Dropout._p_1 (keep prob) with a .p shim fallback so nonzero dropout no longer serializes as 0.0. Drop the now-unused iter_mlx_lora_modules import from trainer.py to clear the F401 added by the previous round. * Preserve external trainables, gate LoRA metadata, fix LoRA+ scope for PR #692 The earlier strictly-LoRA-only save dropped intentionally trained tensors that live outside any LoRA-wrapped module (embed_tokens, lm_head, projector, vision, norm) from the final artifact. Use a module-prefix filter so trainables OUTSIDE LoRA modules survive while base weights INSIDE a LoRA module (which a reload may mark trainable) are still excluded, keeping the original Studio reload leak shut. Adjacent review pass fixes: - save_pretrained_merged(save_method='lora', push_to_hub=True) now uploads the adapter directory via HfApi.upload_folder instead of re-routing through push_to_hub_merged, which would call save_merged_model() and overwrite the adapter-only artifact. - _enrich_mlx_adapter_config() only writes lora_parameters / num_layers / fine_tune_type='lora' / peft_type='LORA' when the model actually has LoRA modules (or the caller declared a lora/dora artifact). Full fine-tune checkpoints saved via save_trainable_adapters() no longer carry fake LoRA metadata that mlx-lm.load_adapters() would treat as a LoRA reload. - LoRA+ gradient scaling in _grad_leaf_scale() now anchors on the parameter suffix (.lora_b) rather than substring "lora_b", so an unrelated trainable like lora_b_router.weight cannot receive the LoRA+ multiplier. - collect_mlx_lora_adapter_tensors() also picks up DoRA modules' trained magnitude vector m alongside lora_a / lora_b so DoRA reload recovers the learned magnitudes. Tests: route preserve external trainables; full-checkpoint config has no LoRA metadata. * Stop leaking base weights under LoRA and restore push-to-hub LoRA API for PR #692 save_trainable_adapters() previously dumped every tensor returned by model.trainable_parameters(). After a checkpoint reload, base tensors inside LoRA-wrapped modules (e.g. q_proj.weight under a wrapped q_proj) can end up marked trainable, so the public mixed-fine-tune route in save_pretrained_merged(save_method='lora') and MLXTrainer.save_model would silently include those base weights in adapters.safetensors. Apply the same lora_module_prefixes filter inside save_trainable_adapters itself: keep external trainables (embed_tokens, lm_head, projector, vision, norm) and the module-anchored LoRA tensors, but drop base weights that live INSIDE a LoRA module. Regression test: test_save_pretrained_merged_lora_mixed_external_drops_inside_lora_base. Adjacent push-to-hub fix: The earlier LoRA push branch used a bare HfApi.upload_folder call that silently dropped four push_to_hub_merged behaviours: caller-supplied repo_id, private-flag updates on re-push, ModelCard tag merging, and the "Trained with Unsloth" commit message + description convention. It also used upload_folder instead of upload_large_folder, so large adapter exports (e.g. save_trainable_adapters dumps with embeddings) could not resume on a flaky connection. Factor _push_lora_adapters_to_hub() that mirrors push_to_hub_merged() end-to-end without re-saving a full merged model on top of the adapter directory, and forward repo_id / commit_message / commit_description / create_pr / revision through save_pretrained_merged so both save methods share the same Hub-push surface. * Honor commit_message / commit_description / create_pr on LoRA push for PR #692 _push_lora_adapters_to_hub previously called HfApi.upload_large_folder which, on huggingface_hub>=0.34 (this repo's floor), silently drops commit_message, commit_description, create_pr, and revision. Callers passing save_pretrained_merged(save_method='lora', push_to_hub=True, commit_message=..., create_pr=True) got a commit titled "Upload N LFS files" on main with no PR opened. The kwargs were only honored by the upload_folder fallback, which huggingface_hub>=0.34 never triggers. Route LoRA pushes through upload_folder primarily (LoRA adapter directories are small, typically <500MB and ~5 files, so chunking buys nothing) and keep upload_large_folder as the fallback for environments where upload_folder is unavailable or has an incompatible signature. Add tests: one asserts that custom commit_message / commit_description / create_pr / revision actually reach the upload call and that upload_large_folder is not hit on the happy path; the other asserts the fallback to upload_large_folder still runs when upload_folder raises TypeError or AttributeError. * Honor custom commit metadata on push_to_hub_merged for PR #692 push_to_hub_merged had the same upload_large_folder kwarg-drop defect as the LoRA helper that was fixed in 50c1db3: on huggingface_hub>=0.34, calling upload_large_folder silently ignores commit_message, commit_description, and create_pr. Callers passing push_to_hub_merged(..., commit_message="Release v2", create_pr=True) got a commit titled "Upload N LFS files" on main with no PR opened. Unlike LoRA adapters, merged saves can be multi-GB so upload_large_folder's chunked-resume behavior is still the right default for the common case. Gate the route choice on whether the caller passed custom commit metadata (commit_message / commit_description differ from the function defaults, or create_pr=True): use upload_folder when the caller explicitly cares about commit semantics, else default to upload_large_folder for resume-on-multi-GB behavior. Keep the symmetric AttributeError/TypeError fallback either way. Hoist the defaults into module-level _PUSH_MERGED_DEFAULT_* sentinels so the "did the caller customize this" check is robust to future default-string edits. Add tests asserting custom create_pr / commit_message routes through upload_folder, and that the no-custom-metadata case still uses upload_large_folder for the large-merge resume path. * Tighten DoRA m gate + reject empty save_adapter_artifacts for PR #692 `m` is a generic 1-letter attribute name. If a future LoRA wrapper exposes self.m as a learned mixing scalar (not a DoRA magnitude vector), the prior hasattr(module, "m") branch would ship it under DoRA semantics. Gate the collection on type(module).__name__.startswith("DoRA") so only real DoRA modules contribute the magnitude tensor; today's real DoRALinear / QDoRALinear both match. _save_adapter_artifacts previously emitted adapter_config.json with no adapters.safetensors next to it when tensors={}. Every public caller already raises a clear ValueError before getting here, but enforce the invariant locally so any future direct call cannot produce a half-written artifact that mlx-lm reload chokes on. Add tests: collect_mlx_lora_adapter_tensors skips an unrelated `m` attr on a non-DoRA module; _save_adapter_artifacts raises ValueError on empty input. * Treat custom revision as commit-metadata for push_to_hub_merged + DoRA test for PR #692 push_to_hub_merged previously dropped a custom `revision=` kwarg on the default route: when commit_message/commit_description/create_pr were all defaults but revision was non-None, _caller_wants_commit_metadata was False and the upload routed through upload_large_folder, which silently lands on main regardless of the revision arg. Add `revision is not None` to the predicate so a custom target branch also forces upload_folder. Add positive DoRA collect test: a class whose name starts with "DoRA" must have its `m` magnitude tensor included in the collected adapter keys. Without this lock, a future typo (DORA, Dora, wrong attribute name) would silently strip DoRA magnitudes from every export and the existing negative test would still pass. Add push_to_hub_merged revision-routing test asserting the new predicate. * Fix 4 P1s from reviewer.py round on PR #692 1) Full-finetune checkpoints now stamp fine_tune_type="full". When _enrich_mlx_adapter_config runs on a no-LoRA model, it previously left fine_tune_type unset. mlx-lm's load_adapters() defaults missing fine_tune_type to "lora" and then reads num_layers / lora_parameters, so the saved full-precision tensors failed to reload. The else branch now setdefault("fine_tune_type", "full") so reload routes correctly. 2) DoRA exports now stamp fine_tune_type="dora". The collector already includes the q_proj.m magnitude tensor for DoRA classes, but the adapter_config still said "lora". mlx-lm only recreates DoRA wrappers when use_dora=(fine_tune_type=="dora"), so the saved m tensor was silently dropped on reload via strict=False. Detect DoRA modules via type(module).__name__.startswith("DoRA") and override the fine_tune_type to "dora" before setdefault picks the lora default. 3) _is_lm_head_trainable now filters base weights inside LoRA-wrapped modules using the same lora_module_prefixes pattern that save_trainable_adapters uses. A LoRA-wrapped lm_head with a reload-leaked lm_head.weight previously made the function return True, defeating the CCE memory guard and computing a full V x H weight gradient per chunk. Now reload-leaked base tensors under LoRA modules are correctly treated as non-trainable. 4) _push_lora_adapters_to_hub now uploads with allow_patterns scoped to adapter / tokenizer / config / readme files. Without this filter, a save_directory that already contained stale model-*.safetensors, model.safetensors.index.json, or *.gguf files from a prior merged save would silently push them to the LoRA adapter repo. Public-by-default repos would have leaked merged weights under a "LoRA adapter" repo; the allow-list rules out catch-alls like "*.safetensors" / "*" so a future allow_patterns expansion cannot re-introduce the regression. Also unblock the lora_parameters / top-level rank-scale-dropout backfill: the top-level keys are now backfilled whenever lora_parameters is present, not only when lora_parameters was computed by this function. mlx-vlm reads the top-level keys directly; without this fix a caller-supplied lora_parameters dict left rank/scale/dropout absent at the top level. Also drop the num_layers=-1 sentinel here too (mirroring the trainer side None-gating in PR #679): if _get_transformer_layers returns nothing, num_layers is simply omitted so mlx-lm's loader raises a clear AttributeError instead of slicing range(-1) and applying zero LoRA layers. Tests: enrich stamps "full" on no-LoRA models, "dora" on DoRA models; _is_lm_head_trainable returns False when only LoRA-wrapped lm_head plus reload-leaked weight are trainable; _push_lora_adapters_to_hub passes a restrictive allow_patterns list that excludes stale full-model artifacts. * Preserve trainable .bias under LoRA-wrapped modules for PR #692 The reload-leak guard previously dropped EVERY trainable key under a LoRA module prefix. That correctly excludes the wrapped base .weight (the V x H matmul gradient that defeats the memory guard), but also dropped other trainable params at the same path: notably q_proj.bias when the user explicitly trained bias=True on a LoRA-wrapped Linear. Refine save_trainable_adapters's filter to drop only `.weight` keys under LoRA prefixes. .bias and other params under the same prefix now survive, so a checkpoint+reload roundtrip preserves them. The routing decision in save_pretrained_merged and MLXTrainer.save_model follows the same refinement: previously routed to save_trainable_adapters only when there was a trainable OUTSIDE any LoRA module. A trainable .bias INSIDE a LoRA module (with no external trainables) would have routed to save_lora_adapters and been dropped. Now routing treats inside-LoRA `.weight` as the only reload-leak risk; any other trainable (bias / external / etc.) triggers the trainable-aware writer. Add a test that exercises a LoRA-wrapped Linear with bias=True and an external trainable, asserting q_proj.bias survives to the safetensors output while q_proj.weight is correctly dropped. * Tighten LoRA filter + push routing + fine_tune_type stamps for PR #692 Several R9 reviewer.py findings, batched. (1) Quantized base tensors leaked through the adapter writer. The .weight-only filter missed .scales, .biases, and their .linear.* variants on mlx-lm QuantizedLinear, so QLoRA reload-trainable layers' quantization tensors slipped into adapters.safetensors. Hoist the filter into a shared _is_base_tensor_inside_lora_module helper with an extended suffix tuple (weight / scales / biases / linear.weight / linear.scales / linear.biases). save_trainable_adapters, save_pretrained_merged routing, MLXTrainer.save_model routing, and _is_lm_head_trainable all use the same predicate now. (2) Root-level LoRA wrappers (module_name == "") leaked bare `weight` / `scales` / `biases`. The empty-name prefix is intentionally excluded from lora_module_prefixes (otherwise it swallows the whole tree), so the filter never matched. Add a has_root_lora_module flag and special-case the bare-key check when one exists. (3) Caller fine_tune_type="full" while the model has LoRA modules produced an internally-inconsistent artifact: saved lora_a/lora_b but config said full, so mlx-lm reload skipped LoRA wrapping and silently dropped the adapter. Override the caller to "lora" in that case. (4) Full-finetune saves carried stale LoRA fields (peft_type, lora_parameters, rank, scale, dropout, num_layers, unsloth_mlx_lora_module_paths) from re-used config dicts. setdefault did not strip them. Now the no-LoRA branch unconditionally pops stale LoRA keys so reload sees a clean full-finetune dict. (5) revision is not None forced upload_folder on merged pushes, losing the resumable/chunked path for multi-GB merged uploads. upload_large_folder accepts revision natively; only commit_message / commit_description / create_pr force upload_folder. Drop revision from _caller_wants_commit_metadata. (6) create_pr=True silently fell back to upload_large_folder on TypeError/AttributeError from upload_folder, landing on main with no PR. Refuse to silently lose the PR boundary; raise a clear RuntimeError telling the user to upgrade huggingface_hub or call create_pull_request() + pass revision themselves. Applied to both the LoRA helper and the merged push fallback. (7) Embedding/lm-head LR multiplier still used substring matching, so names like decoder.not_lm_head_router.weight or foo.embed_tokens_aux.weight could pick up the embedding LR. Anchor on path segments via name.split(".") so the multiplier only fires on a real `embed_tokens` or `lm_head` segment. Update the prior revision-routing test to assert the new behavior (upload_large_folder honors revision alone). Add tests for: quantized base scales/biases dropped, caller fine_tune_type="full" override on LoRA-bearing models, stale LoRA fields stripped on full-finetune, create_pr=True failure raises instead of silent main push. * Write minimal README before ModelCard.load so tags propagate for PR #692 Round-9 review caught that `_push_lora_adapters_to_hub` (and the merged push) only applied caller-provided `tags=[...]` when a README.md already existed in `save_directory`. Fresh LoRA adapter directories carry the adapter and tokenizer files but no model card, so `ModelCard.load()` raised `FileNotFoundError` and the requested tags silently never reached the Hub. Seed a minimal YAML front-matter `README.md` before loading the card when no card exists yet, then merge the requested tags with the existing ones. Applied to both the LoRA push and the merged push (merged saves usually have a card from `save_merged_model()`'s `create_model_card` fallback, but seed defensively so an upstream card-fallback failure does not quietly drop tags either). * Segment-match lm_head + harden commit-metadata fallback + cross-check fine_tune_type for PR #692 Round-10 review caught several asymmetric / silent-fallback bugs in the adapter save and Hub push paths: * `_is_lm_head_trainable` matched `'lm_head'` / `'embed_tokens.weight'` as substrings, so unrelated trainables like `decoder.not_lm_head_router.weight` or `foo.embed_tokens_aux.weight` were misclassified as the real LM head. Switch to segment matching to mirror the trainer-side LR-multiplier fix already in place. * Both `_push_lora_adapters_to_hub` and `push_to_hub_merged` fallback paths previously raised only when `create_pr=True`, silently dropping caller-provided `commit_message` / `commit_description` when upload_folder() was unavailable. Refuse the fallback whenever any of the three commit-metadata fields was set by the caller; capture the intent BEFORE the default backfill so `None` vs. user-string is still distinguishable. * `push_to_hub_merged(..., commit_message=None, commit_description=None)` forced the non-resumable upload_folder route because `None` did not equal the default constant. Treat explicit `None` as equivalent to the default so wrapper layers that forward optional kwargs do not accidentally lose large-folder resume semantics. * `_enrich_mlx_adapter_config` trusted caller-supplied `fine_tune_type='lora'` even when the live model had no LoRA modules, writing LoRA fields to a full-finetune artifact and breaking reload. Cross-check the declared artifact against the live model so stale caller config always yields to ground truth. * `_is_base_tensor_inside_lora_module` only caught bare `weight` / `scales` / `biases` at the root LoRA wrapper, missing the QuantizedLinear-wrapped variants (`linear.weight`, `linear.scales`, `linear.biases`) that live there too. Introduce a small `_ROOT_LORA_WRAPPED_BASE_KEYS` whitelist covering both shapes. * Fail loud on private=True + derive fine_tune_type from live model + mirror lora_parameters top-level for PR #692 Round-11 review caught three remaining symmetric-fix gaps in the adapter save and Hub push paths: * `update_repo_settings` failure used to print-and-continue, which means an existing public repo stays public when the caller asked for `private=True`. Refuse to upload in that case so adapters never get published to a public Hub URL the caller explicitly tried to make private. Public-by-request failures still print and continue (their intent is unchanged). * `_enrich_mlx_adapter_config` previously preserved caller-provided `fine_tune_type` when the value was `"dora"` but the live model had no DoRA modules (or vice versa). mlx-lm would then rebuild DoRA wrappers and look for a `.m` magnitude tensor the saved adapters.safetensors does not contain, dropping every adapter via strict=False. Derive `fine_tune_type` strictly from the live model presence of `DoRA*` modules; the caller's stale value yields to ground truth. * Top-level `rank` / `scale` / `dropout` were only backfilled when absent, so a caller-supplied stale `rank=99` could shadow the real `lora_parameters.rank=4`. mirror `lora_parameters` to the top level verbatim so the two shapes mlx-vlm reload checks always agree. * Harden push_to_hub_gguf private=True to match LoRA/merged push for PR #692 Round-13 Opus review caught that `push_to_hub_gguf` missed the same visibility hardening applied to `_push_lora_adapters_to_hub` and `push_to_hub_merged` in R11. `HfApi.create_repo(exist_ok=True)` is a no-op for the visibility flag on existing repos, so `model.push_to_hub_gguf(repo_id="me/existing-public-repo", private=True)` would silently upload the GGUF shards (often multi-GB merged weights) to a public Hub URL the caller explicitly tried to make private. Apply the same pattern: paired `update_repo_settings(private=...)` + hard-fail RuntimeError when `private=True` was requested and the visibility update fails. Public-by-request failures continue to print-and-continue. This is the highest-impact of the three push paths because GGUF files are typically the ones users distribute most widely, and they contain full merged weights rather than adapters. * Honour omitted private=None on create_repo + cover root + non-root LoRAEmbedding for PR #692 Round-15 review caught two regressions I introduced plus one asymmetric-fix gap: * All three create_repo sites (LoRA push, merged push, GGUF push) used `private=bool(private) if private is not None else False`, which silently forces a brand-new repo to `private=False` whenever the caller did not pass the kwarg. That overrides Hugging Face Hub organization-level default-private policies: a user inside a default-private org would get a public repo on first push. Build the create_repo kwargs conditionally so `private` is omitted unless the caller actually set it; the Hub then applies the account/org policy for initial visibility. The existing update_repo_settings + R11 hard-fail-on-failure logic still enforces explicit `private=True`. * `_LORA_WRAPPED_BASE_SUFFIXES` only covered `.linear.*` wrapped-base state. LoRAEmbedding and DoRAEmbedding wrap an inner nn.Embedding at `.embedding`, so non-root embedding adapters could leak `embed_tokens.embedding.weight` through `save_trainable_adapters` and `save_pretrained_merged(save_method="lora")`. Add the `.embedding.weight` / `.embedding.scales` / `.embedding.biases` suffixes alongside the existing `.linear.*` variants. * `_ROOT_LORA_WRAPPED_BASE_KEYS` had the same gap for root-level LoRAEmbedding wrappers. Extend the frozenset to include `embedding.weight` / `embedding.scales` / `embedding.biases`. * Drop wrapped linear.bias + reload DoRA paths with DoRA wrappers for PR #692 Fixes two asymmetric-fix gaps from round 16 review: 1. unsloth_zoo/mlx/utils.py: `_LORA_WRAPPED_BASE_SUFFIXES` and `_ROOT_LORA_WRAPPED_BASE_KEYS` already filtered the wrapped base weight / scales / biases of an mlx-lm `LoRALinear`, but missed the inner `linear.bias` of a bias-bearing wrapped `nn.Linear`. Without it, `q_proj.linear.bias` and `lm_head.linear.bias` leaked into adapter saves and `_is_lm_head_trainable()` reported True for adapter-only training. A bare `.bias` is deliberately NOT in the suffix tuple because `q_proj.bias` at the LoRA-module level is user-trained bias state that an earlier round explicitly preserved. 2. unsloth_zoo/mlx/loader.py: `_apply_lora_at_paths()` only imported `LoRALinear` and wrapped every saved module path as plain LoRA. After the PR added DoRA export (fine_tune_type='dora' + saving the `.m` magnitude tensor), the path-aware reload branch silently downgraded DoRA back to LoRA because the recreated wrapper had no `.m` parameter, and the downstream `model.load_weights( strict=False)` dropped `q_proj.m`. Honour the saved `fine_tune_type`: import `DoRALinear` / `DoRAEmbedding` and wrap with DoRA classes when the adapter declares DoRA. Cover `nn.Embedding` and `SwitchLinear` paths too (DoRA falls back to LoRA on switch modules since mlx-lm has no DoRA switch wrapper). * Optional newer LoRA classes + verify-before-fail private guard + switch rank fix for PR #692 Fixes three asymmetric-fix / regression issues from round 17 review: 1. unsloth_zoo/mlx/loader.py: `_apply_lora_at_paths` unconditionally imported `LoRAEmbedding` and `LoRASwitchLinear` from `mlx_lm.tuner.lora`. Older mlx-lm wheels that only ship `LoRALinear` raised an ImportError before any wrapper was attached, so linear-only adapters could not reload. Wrap both imports in try/except and treat missing classes as "skip this module type" instead of failing the whole adapter load path; LoRALinear (which has shipped since the first mlx-lm release we support) is still required. 2. unsloth_zoo/mlx/utils.py: the round-16 `update_repo_settings` block always raised on private=True when the call failed, even though `create_repo(private=True)` had already set visibility on freshly- created repos. A token without `write:repo_settings` then blocked the upload even though the repo was already private. Refactor the three duplicated blocks into `_ensure_hub_repo_visibility(api, repo_id, private)` which (a) skips the update when private is None, (b) prints-and-continues on private=False, (c) on private=True verifies via `repo_info` whether the repo is already private after the failed update; only raises when the visibility is confirmed non-private. 3. unsloth_zoo/mlx/utils.py: `_extract_mlx_lora_parameters` inferred switch LoRA rank from `lora_a.shape[-1]` for 2-D layouts, but `mlx-lm==0.22.x` stores switch `lora_a` as `(rank * num_experts, in_dims)`, returning ranks like 64 or 4096 instead of 4 or 8. Prefer `lora_b.shape[-1]` when the module declares `num_experts` since both layouts (newer mlx-lm and 0.22.x) keep rank as the trailing axis of `lora_b`. Plain LoRALinear keeps the existing `shape[-1]` fallback. * Re-raise DoRA-unavailable + restore .linear.bias as trainable + drop stale adapter_model.safetensors + overwrite stale lora_parameters for PR #692 Fixes four round 18 findings: 1. unsloth_zoo/mlx/loader.py: the path-aware reload site caught every Exception from `_apply_lora_at_paths()` and fell back to upstream `load_adapters()`. That fallback rebuilds plain LoRALinear wrappers at every saved path, so the explicit `RuntimeError` raised when `fine_tune_type='dora'` but `mlx_lm.tuner.dora` is unavailable was silently swallowed and DoRA `*.m` magnitude tensors dropped via `load_weights(strict=False)`. Catch RuntimeError separately and re-raise when the message contains the DoRA-unavailable signal, then continue catching every other Exception. 2. unsloth_zoo/mlx/utils.py: round 16 added `.linear.bias` to `_LORA_WRAPPED_BASE_SUFFIXES` and `_ROOT_LORA_WRAPPED_BASE_KEYS` to drop the wrapped base bias of LoRALinear. That was wrong: upstream mlx-lm LoRALinear also stores the legitimate user-trained bias at `q_proj.linear.bias`, and dropping it silently loses training state when the user unfreezes / trains bias. Remove `.linear.bias` from both filter sets and document why bare `.bias` and `.linear.bias` must survive trainable checkpoints. 3. unsloth_zoo/mlx/utils.py: the LoRA upload allow-list still permitted `adapter_model.safetensors`. The MLX adapter save path writes `adapters.safetensors`, never `adapter_model.safetensors`; a matching file in the directory is therefore stale by definition (PEFT/HF leftover) and would re-upload that stale PEFT adapter alongside the current MLX adapter. Drop it from the allow-list. 4. unsloth_zoo/mlx/utils.py: `_enrich_mlx_adapter_config` only wrote `lora_parameters` from `_extract_mlx_lora_parameters` when the caller dict did not already have it. A reused config with stale `lora_parameters={"rank": 99}` then survived against actual rank-4 saved tensors, causing reload wrapper mismatch. When the live model has any LoRA modules, derive rank/scale/dropout from the live walker and overwrite caller-supplied stale values; only preserve caller `lora_parameters` when the live model has no LoRA modules (declared_lora_artifact-only path). * Let namespaced Unsloth MLX RuntimeError propagate past the outer adapter-detection except for PR #692 Round 18 added a DoRA-unavailable RuntimeError re-raise inside the inner _apply_lora_at_paths handler so the saved DoRA `*.m` magnitude tensors would not silently drop into a plain-LoRA reload. The re-raise worked, but the outer adapter-detection `except Exception as e:` block at loader.py only re-raised ValueError("Unsloth:") and swallowed every other exception into a print + standard-load fallback. The DoRA RuntimeError was therefore re-raised by the inner handler and then immediately caught and silenced by the outer handler, falling back to the base-only load path with no warning that fine_tune_type='dora' was unsupported. Extend the outer guard to also re-raise namespaced `RuntimeError("Unsloth MLX:")` so the inner handler's intentional re-raise reaches the user; unrelated runtime errors keep falling through to the soft standard-load fallback. * Navigate list-indexed module paths in _apply_lora_at_paths for PR #692 `_apply_lora_at_paths` re-attached LoRA wrappers via `setattr(parent, leaf, wrapped)` and `parent = by_name.get(parent_path, model)`. That silently no-ops when the saved path ends in a numeric segment such as `vision_tower.merger.layers.0`, which the training-time `_lora_walk_module` deliberately produces for Qwen2.5-VL's vision merger and any projector whose Linears live inside a Python list: - `name.rpartition(".")` -> `parent_path="vision_tower.merger.layers"`, `leaf="0"`. - `by_name.get(parent_path, model)` returns `model` (the fallback) because `named_modules()` does not emit list containers themselves. - `hasattr(model, "0")` is False, so the `setattr` branch is skipped silently. The base nn.Linear stays in place. - The follow-up `load_weights(strict=False)` then silently drops the saved `...layers.0.lora_{a,b}` tensors and the user gets a "successful" reload with a fully reverted vision/projector LoRA and no warning. Mirror the navigation pattern from `_lora_walk_module`: split the saved path, walk segments trying `parent[int(seg)]` first then `getattr` for attribute access, and apply the same `parent[int(leaf)] = wrapped` fallback to `setattr` for the final segment. List-indexed wrappers now install correctly and the subsequent `load_weights(strict=False)` binds the saved tensors instead of dropping them. * Guard DoRA fallback + fail-loud LoRAEmbedding + treat default commit string as default for PR #692 Three asymmetric-fix gaps caught by reviewers. 1) FastMLXModel.from_pretrained's adapter-detection inner block had a DoRA capability check inside the saved-paths branch (via _apply_lora_at_paths raising RuntimeError when mlx_lm.tuner.dora is missing), but the no- saved-paths fallback fell straight through to load_adapters(model, local_path) without that check. A DoRA adapter without unsloth_mlx_lora_module_paths would silently rebuild plain LoRA wrappers, then drop the .m magnitude tensors via the strict=False reload. Add the same capability check before the fallback load_adapters call so the user gets the namespaced "install a DoRA-capable mlx-lm" error. 2) _apply_lora_at_paths silently continued past embedding LoRA paths when the installed mlx-lm predated LoRAEmbedding. The saved embed_tokens adapter tensors then dropped via the downstream strict=False reload with no warning, leaving the user with a partially loaded adapter. Raise a namespaced RuntimeError instead, parallel to the DoRA-unavailable guard. 3) _push_lora_adapters_to_hub treated commit_message is not None as "caller wants custom metadata", which made the upload_large_folder fallback refuse uploads even when the caller forwarded the Unsloth default string verbatim (push_to_hub_lora -> _push_lora_adapters_to_hub pattern). Compare against the default strings the same way _push_merged_model_to_hub already does, so default-string forwarding no longer blocks the fallback. * Broaden inner-handler re-raise to match LoRAEmbedding fail-loud The fail-loud RuntimeError raised inside _apply_lora_at_paths when mlx_lm.tuner.lora.LoRAEmbedding is unavailable was being swallowed by the inner except clause's narrow DoRA-only re-raise filter, and falling through to load_adapters() silently dropped the saved embed_tokens.lora_a / lora_b tensors via the strict=False reload. Broaden the inner filter to re-raise any "Unsloth MLX:" namespaced RuntimeError so the embedding-LoRA capability gap surfaces to the caller the same way the DoRA capability gap does. Mirrors the outer handler's broader check directly below. * Tighten verbose comments in mlx adapter export code Comments-only refactor: drop pure narration and compress multi-line WHY explanations to a single line per intent. No behavior change. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Merges 5 main-side mlx fixes (unslothai#673 zero-token CCE, unslothai#679 + unslothai#692 LoRA save metadata, unslothai#682 invalid label NaN-poisoning, unslothai#688 tool mask). All 13 conflict regions in unsloth_zoo/mlx/utils.py resolved to keep PR unslothai#684's behavior where it conflicts on semantics: - half-open `<` length mask (PR unslothai#684 fix) wins over main's inclusive `<=` - `if labels is None` branch preserved (PR unslothai#684 generality) alongside main's `_normalize_cce_label_dtype` dtype widening - `_get_image_token_ids` legacy wrapper kept alongside main's new `_normalize_cce_label_dtype` / `_normalize_numpy_cce_labels` - `_mask_label_token_ids` calls `_normalize_cce_label_dtype` first so image masking honors main's uint-widening contract - HEAD's `_expand_token_replacements` dropped; main's three-function split (`_normalize_numpy_cce_labels` + `_expand_image_token_sequences` + `_expand_token_runs`) is canonical; duplicate HEAD wrappers removed - `_collate_vlm_prompt_completion_batch` reads back the masked labels in int64 so image + attention masking survives without narrowing - prompt-completion VLM collator routes through `_apply_vlm_label_masks` after dtype normalisation so ignore_token_ids and wide invalid ids both reach runtime CCE intact - `_to_mx_vlm_batch` uses main's `_normalize_cce_label_dtype` for labels while keeping PR unslothai#684's token_type_ids / mm_token_type_ids handling - `_unsloth_*` prefix filter preserved so the new collated_position_ids flag and main's raw-input-ids carrier both get stripped 152 MLX tests pass post-merge.
Summary
Fix MLX LoRA adapter saves so
adapter_config.jsoncontains the metadata needed to recreate adapters faithfully on reload.rank,scale, anddropoutinto bothlora_parametersand top-level compatibility fields.por MLX's_p_1keep-probability storage(num_experts, r, input_dims)unsloth_mlx_lora_module_pathsinstead of overwriting caller-provided topology metadataMLXTrainer.save_modelWhy
Several MLX save paths call
save_lora_adapters()without a complete explicit adapter config. Missing or stale LoRA metadata can make reload reconstruct adapters with different rank, scale, or dropout settings, which changes post-reload behavior.This also fixes two edge cases:
Dropoutstores keep probability as_p_1, so reading only.pwrites staledropout=0.0.LoRASwitchLinearstores rank onlora_a.shape[-2], notshape[-1].