[tests] feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) by danielhanchen · Pull Request #36 · Datta0/unsloth-staging-3

danielhanchen · 2026-05-05T11:41:34Z

Staging mirror of unslothai/unsloth#5265

Original PR: unslothai/unsloth#5265
Author: Manan17

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Summary

Routes Studio's training pipeline through MLXTrainer on Apple Silicon, replacing the torch / SFTTrainer path that doesn't run on Mac. Same UI, same one-click flow, same export. Studio now trains models on M1-M5 Macs with the memory wins from unsloth-zoo's MLX integration.

Depends on unslothai/unsloth-zoo#XXX (Apple Silicon training PR).

What's included

Backend (`studio/backend/`)

utils/hardware/hardware.py: detect MLX on Apple Silicon, set CHAT_ONLY = False.
core/training/worker.py: MLX fast-path that bypasses torch / SFTTrainer entirely. Builds FastMLXModel + MLXTrainer, hooks progress / loss / memory events into the
existing event_queue. Supports LoRA + full FT, gradient checkpointing, CCE, train_on_responses_only, and the finetune_language / attention / mlp / vision flags with
auto-imply guardrails (e.g. picking attention without picking a scope auto-implies language). Elementwise grad clip is on by default.
core/training/training.py: skip CUDA / GPU validation on MLX.
core/inference/mlx_inference.py: text + VLM streaming inference. Wires top_k, top_p, repetition_penalty, temperature through to mlx-lm / mlx-vlm samplers (fixes
silent VLM temperature drop).
core/export/export.py: maps Studio's format dropdown to save_method (LoRA-only / merged 16-bit / merged 4-bit / GGUF variants); passes save_directory to
push_to_hub_merged; private/public toggle.

This PR contains test changes only (2 files). Code changes are in the head PR.

Test files:

studio/backend/tests/test_mlx_inference_backend.py
studio/backend/tests/test_mlx_training_worker_config.py

…e-silicon

…streaming for VLM

… upstream worker

…wide Studio UI was showing ~95 GB during MLX training because get_gpu_utilization read "In use system memory" from IORegistry's AGXAccelerator — system-wide GPU memory across all processes (training + backend + browser + Display). Now the trainer's mx.get_peak_memory value is forwarded through the progress event and surfaced via /api/train/hardware while training is active. Falls back to the system-wide reading when training is not running.

M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70% slower prefill compared to native fp16. M3+ have native bf16 (macOS Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware detection via mx.device_info.

Compute use_lora from the UI's training_type before loading the model, pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and let the existing 'if use_lora' branch skip get_peft_model. Matches the GPU worker's flow.

Previously the MLX path called save_pretrained_merged with no save_method, which fell through to a no-op that didn't actually fuse LoRA into the base. Now Studio's "Merged Model" export properly fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU behavior for the same UI option.

MLX push_to_hub branch now forwards private=private (matches GPU) Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model needed') were tripping the route's 3-tuple unpack. Added a None output_path so the unpack always succeeds.

studio wirings

Mirror the GPU worker: stop excluding VLMs and stop hardcoding template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and fetch the per-template instruction/response markers from TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables train_on_completions for vision+image and audio cases, so backend just trusts the flag.

Update/peftkwargs

# Conflicts: # install.sh # studio/frontend/src/config/env.ts # unsloth/__init__.py

…LXModel Studio's four UI checkboxes now actually flow through to MLX get_peft_model (which was just updated in unsloth-zoo to honor them). Also drops the incorrect train_projector wiring that tied projector LoRA to the attn/mlp flags — those are language-side toggles, not projector toggles. Co-Authored-By: Manan17 <shahmanan170602@gmail.com>

…n/mlp UI guardrail. The four checkboxes (vision/language/attention/MLP) carry "scope × module-type" semantics that aren't obvious — picking just "Attention modules" + "MLP modules" without "Language layers" naturally reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune attn/mlp modules in *no* tower" → empty target_modules → zero trainable params → crash inside value_and_grad. If user selected attn or mlp module types but no layer scope, default to language scope. Power users can still explicitly choose language=False, vision=True if they want vision-only fine-tuning of attn/mlp. Co-Authored-By: Manan17 <shahmanan170602@gmail.com>

…x-lm/mlx-vlm Inference UI sliders for top_k and repetition_penalty had no effect on MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing bug: mlx_vlm.generate_step expects temperature= (long form), but we were passing temp= which silently fell into **kwargs — every VLM chat was effectively greedy regardless of the temperature slider. Text path (_generate_text): make_sampler now receives top_k in addition to temp/top_p make_logits_processors built and forwarded when repetition_penalty is non-trivial (skip when 0.0/1.0 to avoid pointless overhead) VLM path (_generate_vlm): Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate forwards them to generate_step's sampler/logits_processor builders) Rename temp= → temperature= so it's actually consumed Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10, rep_pen=1.5} now produces a distinct output, proving the parameters reach the sampler. Co-Authored-By: Manan17 <shahmanan170602@gmail.com>

…or hub push export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit" (was hardcoded merged_16bit, ignoring the UI choice). Both export_merged_model and export_base_model now pass save_directory= to push_to_hub_merged so it reuses the just-written local folder instead of re-saving under a relative "username/model" directory. Co-Authored-By: Manan17 <shahmanan170602@gmail.com>

for more information, see https://pre-commit.ci

unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel` right after defining `class FastVisionModel(FastLanguageModel)` with a `for_training` static method. The alias erased the class binding, so the documented `FastVisionModel.for_training(model)` call from upstream Unsloth's VLM notebooks raised `AttributeError` on MLX. Remove the offending alias. `FastVisionModel` is now a real subclass of `FastLanguageModel` again — inherits `from_pretrained` / `get_peft_model` / `for_inference`, exposes `for_training` as a no-op pass-through (no-op because MLX doesn't have a train/eval mode flag; the call exists purely for GPU/MLX notebook parity). Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via FastVisionModel.from_pretrained → get_peft_model → for_training → MLXTrainer.train runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs, peak 5.89 GB). Studio's path (FastLanguageModel.from_pretrained for any repo, auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8.

Studio export Restore Tuple[bool, str, Optional[str]] contract on export_merged_model, export_base_model, export_gguf, and export_lora_adapter, populating output_path on successful local saves so routes/worker/CLI/frontend details.output_path is non-empty again. Lift the GPU save_method assignment out of the local-save branch so Hub-only merged exports (save_directory='', push_to_hub=True) no longer hit UnboundLocalError on the push branch. For MLX merged and base hub-only export, stage to a tempfile.TemporaryDirectory before push_to_hub_merged instead of passing save_directory=''. Source _IS_MLX from unsloth instead of recomputing the platform check (single source of truth, also enforces mlx-package availability). Studio MLX training/inference Pass token=hf_token into FastMLXModel.from_pretrained for gated/private models, matching the inference path. Strip hf_token and wandb_token from wandb.init(config=...) so secrets do not leak into the W&B run config. Replace load_from_disk(local_datasets[0]) with the existing UnslothTrainer._resolve_local_files / _loader_for_files helpers so uploaded JSON/JSONL/CSV/Parquet files train through the normal datasets loader (load_from_disk still used for HF save_to_disk directories). Make the dataset slice helper inclusive at the end and treat 0 as a real index instead of "unset", matching the GPU and embedding paths. Add a status_message -> message alias inside _send so the existing parent pump (training.py) renders MLX status updates instead of blanks. Forward min_p through generate_chat_response into _generate_text / _generate_vlm and into make_sampler / vlm_kwargs so the sampling control is no longer a no-op on MLX. Wrap unsloth_zoo.mlx_loader / mlx_trainer imports with a clearer ImportError pointing users at install.sh for Apple Silicon. Exit the MLX stop-polling thread on EOFError/OSError instead of busy-looping when the queue/pipe is permanently closed (one-line why-safe rationale inline). Studio frontend ParamsSection subscribes to platform deviceType via the Zustand hook so the gradient checkpointing dropdown re-renders after the async device fetch completes. Studio hardware get_gpu_utilization MLX branch now reads _read_apple_gpu_stats once and derives VRAM totals from psutil, removing the second ioreg subprocess per utilization poll. Unsloth core Restore the os.geteuid == 0 guard around the CUDA ldconfig recovery that was lost when GPU initialization moved into _gpu_init.py, plus the non-root manual-fix warning branch. Non-root CUDA users no longer shell out to ldconfig at import time. Load dataprep/raw_text via importlib so the MLX import path no longer pulls torch in through dataprep/__init__.py -> synthetic.py. FastVisionModel.from_pretrained overrides the inherited delegator only to inject text_only=False; this is an extension, not a duplication, and is needed so VLM checkpoint loads keep the vision tower. Wrap the MLX-branch unsloth_zoo import with a clearer ImportError.

* fix: developer to api * fix: help svg and Unsloth text --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

* fix: developer to api * fix: help svg and Unsloth text * svg fix --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports Extract original GPU init to _gpu_init.py (unchanged) MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code GPU path unchanged: from ._gpu_init import *

…g guard tests/python/test_gpu_init_ldconfig_guard.py asserts the geteuid root check still wraps the ldconfig recovery and the non-root branch warns bnb users; AST + source-text inspection so the test runs without torch. tests/studio/test_export_output_path_contract.py covers the Tuple[bool, str, Optional[str]] return contract on every export method, the output_path assignment after successful local save, the Hub-only GPU save_method binding fix, the MLX hub-only TemporaryDirectory staging, and the single-source `_IS_MLX` import from unsloth. tests/studio/test_mlx_training_worker_behaviors.py covers token forwarding to FastMLXModel.from_pretrained, wandb config secret stripping, file-aware local dataset loading, status_message -> message aliasing, inclusive slice semantics, EOFError/OSError stop thread exit, and the friendly mlx_loader / mlx_trainer ImportError.

danielhanchen · 2026-05-05T12:37:23Z

Fixes pushed to unslothai/unsloth#5265.

Manan17 and others added 30 commits April 13, 2026 20:02

mlx with studio

c588fe6

updating temporary install.sh

48632f8

adding t_v5 path

1bf5853

fixing vision training

728d08c

adding chat

297adec

minor

e654ac1

Adding export and fixing training issues, inference with lora adaptors

77f53af

Merge remote-tracking branch 'origin/fix/mlxvlmcompile' into mlx-appl…

f1673fc

…e-silicon

fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, …

f08953d

…streaming for VLM

Merge mlx-apple-silicon into main

e1f096f

update install.sh to point to main branch

de85036

fix: export returns 3 values (success, message, output_path) matching…

be99b87

… upstream worker

studio wirings

dfdcf5d

Merge pull request #5 from Manan17/feat/quant_config

b42426e

studio wirings

wire in lora rslora, init lora weights, random_state

2f4e038

loftq studio error message fix

a9edbfa

handle unknown optim and lr scheduler

4d58b95

Merge pull request #6 from Manan17/update/peftkwargs

f08b021

Update/peftkwargs

Merge remote-tracking branch 'upstream/main'

b511646

# Conflicts: # install.sh # studio/frontend/src/config/env.ts # unsloth/__init__.py

Merge branch 'unslothai:main' into main

d799e2e

pre-commit-ci Bot and others added 10 commits May 4, 2026 06:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

8c51668

for more information, see https://pre-commit.ci

restore install

760714c

Merge branch 'main' into main

b1d5215

Merge remote-tracking branch 'origin/main' into

0b1baa1

Update pyproject.toml

be874c7

Update _utils.py

2fba3b6

fix: developer to api (#5281)

d741cc9

Studio: help svg replacement and Unsloth sidebar text (#5282)

d8a0beb

* fix: developer to api * fix: help svg and Unsloth text --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

danielhanchen force-pushed the pr-5265-tests branch from 81f35a1 to da591a0 Compare May 5, 2026 12:20

Imagineer99 and others added 3 commits May 5, 2026 05:22

Chore/help svg (#5283)

832f48c

* fix: developer to api * fix: help svg and Unsloth text * svg fix --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

Add Apple Silicon MLX routing

6c9345a

Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports Extract original GPU init to _gpu_init.py (unchanged) MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code GPU path unchanged: from ._gpu_init import *

danielhanchen force-pushed the pr-5265-tests branch from 3b31223 to 7b3b20d Compare May 5, 2026 12:32

danielhanchen closed this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export)#36

[tests] feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export)#36
danielhanchen wants to merge 43 commits into
mainfrom
pr-5265-tests

danielhanchen commented May 5, 2026

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

danielhanchen commented May 5, 2026

Staging mirror of unslothai/unsloth#5265

Original description

Summary

What's included

Backend (studio/backend/)

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Backend (`studio/backend/`)