Fix MLX save and GGUF export parity issues by Lyxot · Pull Request #697 · unslothai/unsloth-zoo

Lyxot · 2026-05-26T16:07:20Z

This PR fixes several issues around MLX merged saves and GGUF export parity.

Bug list:

VLM merged/full saves were missing VLM config fidelity
Root cause: MLX merged saves used the text-only mlx_lm.utils.save_config path for VLM checkpoints, so VLM-specific config structure, normalization, and fields were not preserved in config.json.
Commit: 615aff8
QLoRA merged 16-bit saves were not fully dequantized
Root cause: The merged 16-bit path treated LoRA fusion with fuse(dequantize=True) as sufficient dequantization, but quantized non-LoRA modules could remain packed.
Commit: 55fa173
VLM GGUF mmproj output was metadata-only / missing real projector tensors
Root cause: The temporary GGUF staging directory contained mlx-vlm sanitized tensor names/layouts, while llama.cpp MMPROJ conversion expects HF/llama.cpp projector names/layouts; in-place rewriting also risked corrupting file-backed tensors.
Commit: 6fcaa19
MLX GGUF export options like first_conversion were dropped by the bound save method
Root cause: The monkey-patched MLX save_pretrained_gguf wrapper accepted CUDA-compatible **kwargs but discarded them when calling the utility implementation, so caller-specified GGUF options like first_conversion never reached export.
Commit: 0029c54
VLM GGUF exports could leave same-name vision tensors in MLX layout, e.g. Gemma3 patch embeddings stayed OHWI instead of OIHW
Root cause: VLM sanitizer inversion only considered renamed candidates; for models whose sanitizer leaves a tensor name unchanged but changes layout, replay could accept the unchanged name/tensor and stop before applying the needed layout transform.
Commit: 97c980f
VLM GGUF sanitizer replay failed for models whose sanitizers need real submodules, e.g. GLM-OCR
Root cause: Sanitizer replay used a fake proxy object with only config/args, but model sanitizers such as GLM-OCR call submodule sanitizers through real attributes like self.vision_tower, so replay failed and no tensor rewrite happened.
Commit: c7002e7
VLM GGUF export could fail with modern package-layout llama.cpp converters due missing conversion/ sibling package
Root cause: Modern llama.cpp conversion scripts are package entrypoints that import sibling conversion.* modules; writing/running the patched script from a detached cache directory broke that package layout, and the MLX path could also use a different llama.cpp tree than the shared patcher root.
Commit: 4546225
MLX VLM loading could degrade to tokenizer-only processors, so GGUF staging omitted image processor metadata needed by mmproj conversion
Root cause: GLM-OCR custom processor construction could fail through Transformers AutoImageProcessor without torchvision, and mlx-vlm fell back to tokenizer-only metadata.
Commit: 398cb9c
VLM GGUF staging could omit image processor sidecars like preprocessor_config.json, breaking mmproj conversion
Root cause: MLX merged saves trusted tokenizer/model saves to emit every non-weight sidecar needed by downstream converters, but degraded or tokenizer-only processors can omit source files such as preprocessor_config.json, processor_config.json, video_preprocessor_config.json, tokenizer model files, or custom processing files.
Commit: df5deb9
GLM-OCR GGUF exports could advertise a NextN/MTP layer that the MLX model did not export, making llama.cpp load fail on missing blk.16.* tensors
Root cause: Source config metadata could advertise speculative/NextN/MTP layers through fields such as num_nextn_predict_layers, nextn_predict_layers, or mtp_num_hidden_layers, but the loaded/exported MLX language model did not contain those extra blocks.
Commit: d6d5d39
Raw mlx-vlm model saves could miss config.json because model.config dataclass configs were not extracted
Root cause: _get_model_config only handled _config dictionaries and model.args, while raw mlx-vlm models can expose config as a dataclass-like model.config.
Commit: 962dec8

Validation:

Ran python -m pytest tests/test_mlx_save_export_regressions.py -q.
Result: 23 passed.

chatgpt-codex-connector · 2026-05-26T16:07:30Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces comprehensive regression tests and fixes for MLX save and GGUF export parity, including VLM processor repairs, tied embedding materialization, and VLM weight sanitization. The review feedback suggests improving robustness by safely checking if thinker_config is a dictionary before accessing its attributes to prevent potential AttributeErrors, and expanding the saved weights check to support single model.safetensors files in addition to sharded indexes.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves MLX model save/export parity (especially for VLMs) to better match the CUDA path, including GGUF export behavior and preserved sidecar metadata, and adds regression tests for these scenarios.

Changes:

Enhance config extraction/saving for MLX-LM vs MLX-VLM, including dataclass handling and backend-aware save_config.
Add export-time fixes for GGUF/VLM parity (tensor rewrites, tied lm_head materialization, and llama.cpp path anchoring).
Repair degraded MLX-VLM processors by rebuilding image processor + processor class from sidecar configs; add a dedicated regression test suite.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`unsloth_zoo/mlx/utils.py`	Adds backend-aware config saving, tied-embedding materialization, VLM GGUF rewrite pipeline, sidecar copying, and GGUF export adjustments.
`unsloth_zoo/mlx/loader.py`	Repairs degraded VLM processors from sidecar configs and filters supported kwargs for GGUF save/push bindings.
`unsloth_zoo/llama_cpp.py`	Ensures patched converter scripts are written in the correct directory for package layouts.
`tests/test_mlx_save_export_regressions.py`	Adds regression tests covering the new MLX save/export parity behaviors without heavy model downloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Prefer HF vision aliases when inverting mlx-vlm sanitizers, while keeping same-name tensor rewrites as a fallback. This preserves Gemma3 patch embedding layout fixes and avoids stopping early on Qwen-family MLX vision_tower names.

Use loaded VLM model instances when replaying mlx-vlm sanitizers for GGUF tensor rewrites, while keeping the config-derived class pipeline as a fallback. This covers models whose top-level sanitizer delegates through submodules, such as GLM-OCR.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

This reverts commit d8cdf89. Also removes the related regression coverage and later helper hardening now that bug-4 is out of scope.

danielhanchen · 2026-05-27T14:28:43Z

Triple-confirmed this PR end-to-end:

1. Local Linux, four Python versions

47 tests pass on each of 3.10 / 3.11 / 3.12 / 3.13: the 23 regressions from tests/test_mlx_save_export_regressions.py plus 24 additional edge-case probes I wrote to cover backward-compat negatives and the Copilot review comments (_copy_source_sidecars with non-directory src, _has_vision_config with malformed thinker_config, _read_json_file with binary garbage and PermissionError, _rewrite_mlx_vlm_tensor_for_gguf empty pipeline, _save_mlx_config text-only routing, _is_vlm_model on primitives, etc.). All green on every interpreter.

2. Cross-OS shim path

tests/test_mlx_save_export_regressions.py + tests/test_mlx_torch_shim_smoke.py green on macos-14 (Apple Silicon), ubuntu-latest, and windows-latest CI runners. The shim path catches Linux / Windows pathlib regressions that real Apple Silicon hardware would not surface.

3. Real Apple Silicon mlx wheels plus Studio boot

Drove bash install.sh --local --no-torch, force-reinstalled this PR's unsloth-zoo head into the Studio venv, installed real mlx 0.x / mlx-lm / mlx-vlm 0.5.0 wheels, and ran 18 contract probes against the live Apple Silicon kernels (one per fix). All 18 PASS. Then booted unsloth studio (API-only) with the PR's unsloth-zoo overlaid: /api/health returned {\"status\":\"healthy\", ...} in 23 seconds. No Studio boot regression.

Pre-PR vs post-PR diff

Copied this PR's test file onto unslothai/unsloth-zoo:main and re-ran it: 22 failed, 1 passed. Same suite on the PR head: 23 passed. The single pre-PR passer is test_lora_push_uses_lora_adapter_hub_path, a defensive guard the author added to confirm the unchanged LoRA-only push path stays intact -- it correctly passes on both sides. Every other test detects a real, observable failure on main. Sample failure on main: TypeError: push_to_hub_gguf() got an unexpected keyword argument 'first_conversion', which is exactly the bug fix #4 addresses.

Coverage per fix

#	Commit	Confirmed
1	`615aff8e`	VLM config save uses `mlx_vlm.utils.save_config`; `quantization_config` preserved. Text-only still routes through `mlx_lm.utils.save_config` (backward-compat negative)
2	`55fa1737`	QLoRA merged 16-bit fully dequantizes non-LoRA quantized modules
3	`6fcaa19a`	VLM GGUF mmproj tensor names/layouts normalized
4	`0029c545`	`save_pretrained_gguf` / `push_to_hub_gguf` accept and forward `first_conversion` (was the smoking gun on main)
5	`97c980fc`	Same-name layout transforms applied; `_mlx_arrays_match` value-checks real rank-2 mlx arrays
6	`c7002e78`	`_MlxVlmSanitizeProxy` constructable; replay uses real submodules
7	`4546225d`	Patched converter co-located with `conversion/` sibling
8	`398cb9ca`	`_repair_degraded_vlm_processor` rebuilds from sidecars; `_read_json_file` returns `{}` for missing / binary; propagates unexpected errors
9	`df5deb90`	`_copy_source_sidecars` copies non-weight sidecars, skips weights, handles non-directory src cleanly
10	`d6d5d39b`	NextN / MTP metadata stripped when MLX language model doesn't export those layers
11	`962dec85`	`_get_model_config` extracts dataclass-shaped `model.config`

Notes for the maintainer

The branch is currently based on 606d2b2d, which is one commit behind main's bea8b483 tool mask support (#688). A two-way git diff main..HEAD renders unsloth_zoo/rl_replacements.py as removed; a real three-way merge preserves those additions because this branch never touches the file. Worth a rebase before merge to make the diff easier to read.
Unrelated to this PR: tests/test_mlx_finetune_last_n_layers.py::test_get_peft_model_passes_finetune_last_n_layers_through fails identically on main and on this branch with AttributeError: 'FakeModel' object has no attribute 'trainable_parameters'. Separate cleanup.
Also unrelated to this PR: .github/workflows/mlx-ci.yml on main still imports the pre-subpackage flat paths (unsloth_zoo.mlx_loader etc.). It will fail the moment this PR's subpackage layout becomes the default. Should be refreshed to unsloth_zoo.mlx.{loader,compile,utils,trainer,cce}.
Copilot-flagged minor concerns the PR has not yet addressed (none of these block merge): _mlx_arrays_match value-blind on ranks 3 and 6 plus; os.environ mutation in _download_convert_hf_to_gguf_cached is process-global / not thread-safe; isinstance(value, mx.array) -- works fine on the shim and on real mlx 0.x, just flagging for future-proofing.

LGTM.

Copilot AI review requested due to automatic review settings May 26, 2026 16:07

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Comment thread unsloth_zoo/mlx/utils.py

Comment thread unsloth_zoo/mlx/utils.py Outdated

Comment thread unsloth_zoo/mlx/utils.py

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread unsloth_zoo/mlx/utils.py Outdated

Comment thread unsloth_zoo/mlx/utils.py Outdated

Comment thread unsloth_zoo/mlx/utils.py Outdated

Comment thread unsloth_zoo/mlx/utils.py

Comment thread unsloth_zoo/mlx/loader.py

Lyxot added 14 commits May 27, 2026 15:26

fix(mlx): preserve vlm merged config saves

615aff8

fix(mlx): dequantize qlora merged saves

55fa173

fix(mlx): materialize tied lm head exports

d8cdf89

fix(mlx): forward gguf export options

0029c54

fix(mlx): normalize vlm mmproj export tensors

6fcaa19

fix(mlx): rewrite VLM GGUF tensor candidates

97c980f

Prefer HF vision aliases when inverting mlx-vlm sanitizers, while keeping same-name tensor rewrites as a fallback. This preserves Gemma3 patch embedding layout fixes and avoids stopping early on Qwen-family MLX vision_tower names.

fix(mlx): extract vlm config objects for saves

962dec8

fix(gguf): run package converters beside conversion

4546225

fix(mlx): align gguf nextn metadata with tensors

d6d5d39

fix(mlx): repair degraded VLM processors

398cb9c

fix(mlx): preserve source save sidecars

df5deb9

test(mlx): cover save export regressions

e530537

fix(mlx): harden save export review cases

0336fc3

Lyxot force-pushed the fix/mlx-save-gguf-export-parity branch from 444b88b to 0336fc3 Compare May 27, 2026 07:47

Lyxot requested a review from Copilot May 27, 2026 08:34

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread unsloth_zoo/mlx/utils.py

Comment thread unsloth_zoo/mlx/utils.py

Comment thread unsloth_zoo/mlx/utils.py

Comment thread unsloth_zoo/mlx/utils.py

Lyxot added 2 commits May 27, 2026 16:48

fix(mlx): address save export review cases

16ca0d1

revert(mlx): drop tied lm head materialization

f73f432

This reverts commit d8cdf89. Also removes the related regression coverage and later helper hardening now that bug-4 is out of scope.

This was referenced May 27, 2026

PR #697 cross-OS validation (do not merge) danielhanchen/unsloth-staging-2#156

Closed

PR #697 Studio probe on macos-14 (do not merge) danielhanchen/unsloth-staging-2#157

Closed

danielhanchen mentioned this pull request May 27, 2026

Fix mlx-ci.yml torch install gap + finetune-last-n-layers FakeModel stubs #700

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MLX save and GGUF export parity issues#697

Fix MLX save and GGUF export parity issues#697
Lyxot wants to merge 16 commits into
unslothai:mainfrom
Lyxot:fix/mlx-save-gguf-export-parity

Lyxot commented May 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lyxot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielhanchen commented May 27, 2026

1. Local Linux, four Python versions

2. Cross-OS shim path

3. Real Apple Silicon mlx wheels plus Studio boot

Pre-PR vs post-PR diff

Coverage per fix

Notes for the maintainer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lyxot commented May 26, 2026 •

edited

Loading