Skip to content

[CI-validation] MLX PRs #679 + #682 + #692#19

Open
danielhanchen wants to merge 19 commits into
mainfrom
staging/mlx-prs-679-682-692
Open

[CI-validation] MLX PRs #679 + #682 + #692#19
danielhanchen wants to merge 19 commits into
mainfrom
staging/mlx-prs-679-682-692

Conversation

@danielhanchen

Copy link
Copy Markdown
Owner

Summary

Throwaway staging-fork CI validation for three open MLX-only PRs against unslothai/unsloth-zoo, all by @Lyxot:

Do not merge. This PR exists only to surface the three CI runs (mac / linux / windows) in the GitHub UI.

What this branch contains

  1. Upstream unslothai/unsloth-zoo main merged in (origin/main was behind).
  2. The three PR heads merged in order #682#679#692. Git's ort strategy auto-resolved the #679 + #692 collision in unsloth_zoo/mlx/{utils,trainer}.py:
  3. New tests/test_mlx_save_lora_adapters_filter.py — 4 tests over the combined #692 + #679 surface. Closes the coverage gap Copilot flagged on #692 (no test) and exercises #679's _p_1 keep-prob dropout fallback. Runs on every platform via the mlx_simulation shim.
  4. tests/_zoo_aggressive_cuda_spoof.py — deeper torch.cuda spoof from danielhanchen/unsloth-staging-2, kept in reserve for stricter import paths.
  5. Three workflow files: mlx-pr-mac.yml, mlx-pr-linux.yml, mlx-pr-windows.yml.

CI design

Workflow Runner Role
mlx-pr-mac macos-14 Primary signal. Real MLX install + real PR tests (CCE compile-mode, adapter save/reload).
mlx-pr-linux ubuntu-latest Import regression guardrail + the shim-backed save_lora_adapters test. MLX wheels do not install on Linux, so this cannot exercise Metal.
mlx-pr-windows windows-latest Same as Linux, pinned to shell: bash everywhere, no triton.

All three workflows:

  • Trigger only on push to staging/mlx-prs-679-682-692 (so they iterate on force-pushes without burning upstream CI minutes).
  • Have concurrency.cancel-in-progress: true so a new push supersedes a still-running iteration.
  • Have paths: filters scoped to MLX surfaces + their own workflow file.

Test plan

  • All three workflows trigger on the first push.
  • mlx-pr-mac passes: MLX install OK, all 7 MLX submodule imports OK, test_mlx_save_lora_adapters_filter.py (4 tests) green, test_mlx_runtime_cce_compile.py green.
  • mlx-pr-linux passes: import unsloth_zoo OK, is_mlx_available() returns False, shim-backed test_mlx_save_lora_adapters_filter.py green.
  • mlx-pr-windows passes: same as Linux.
  • Force-pushing cancels the prior in-flight run (cancel-in-progress check).

danielhanchen and others added 19 commits April 19, 2026 07:52
Combines three open MLX-only PRs against unslothai/unsloth-zoo into one
staging branch and wires up real ubuntu-latest / macos-14 / windows-latest
GitHub Actions runs to validate them together:

  - unslothai#679 fix(mlx): persist LoRA adapter metadata on save
  - unslothai#682 fix(mlx): handle zero-token and invalid labels in CCE
  - unslothai#692 fix(mlx): save only LoRA adapter tensors

This branch is intentionally throwaway; do not merge into staging/main.
It is the iteration unit for the three workflow files below.

PR unslothai#679 + PR unslothai#692 both touch unsloth_zoo/mlx/{utils,trainer}.py. Git's
ort strategy auto-resolved cleanly:
  - utils.py keeps PR unslothai#692's _save_adapter_artifacts helper + lora_-name
    filter in save_lora_adapters, PLUS PR unslothai#679's _get_mlx_dropout_probability,
    _infer_mlx_lora_rank, and the expanded _enrich_mlx_adapter_config that
    writes lora_parameters / rank / scale / dropout / peft_type=LORA.
  - trainer.py imports both save_trainable_adapters (PR unslothai#692) and the
    helpers from PR unslothai#679, with checkpoint saves switched to the new
    save_trainable_adapters and final adapter export still calling
    save_lora_adapters.

New scaffolding:
  - tests/test_mlx_save_lora_adapters_filter.py: four tests over the
    combined PR unslothai#692 + PR unslothai#679 surface (LoRA-only filter, metadata fields,
    no-adapter ValueError, trainable-checkpoint preserves everything).
    Closes the PR unslothai#692 coverage gap Copilot flagged. Uses mlx_simulation
    so it runs on Linux + Windows too.
  - tests/_zoo_aggressive_cuda_spoof.py: deeper torch.cuda spoof copied
    from danielhanchen/unsloth-staging-2, kept available for harder
    import paths that escape tests/conftest.py's device-type preload.
  - .github/workflows/mlx-pr-mac.yml: macos-14, real MLX install,
    PR-specific pytest set. Primary green signal.
  - .github/workflows/mlx-pr-linux.yml: ubuntu-latest, CPU torch + no-MLX
    install, import smoke + the new save_lora_adapters_filter shim test.
  - .github/workflows/mlx-pr-windows.yml: same as Linux but pinned to
    shell: bash everywhere; no triton.

All three workflows trigger only on push to this staging branch with
paths: filters and cancel-in-progress so force-pushes during iteration
do not queue.
The seven upstream workflows (consolidated-tests-ci, lint-ci, mlx-ci,
security-audit, stale, studio-export-fix-ci, wheel-smoke) would fire on
every push and PR-event to this throwaway staging branch and burn runner
minutes that have nothing to do with validating MLX PRs unslothai#679, unslothai#682, unslothai#692.

Keep only the three mlx-pr-* workflows on this branch. They stay in
upstream main / origin/main untouched -- this deletion is scoped to the
staging branch only.
macOS: pip install -e .[mlx] does not pull torch (correct in production
since MLX replaces torch on Apple Silicon), but the new
test_mlx_save_lora_adapters_filter.py uses torch via the shim. Add an
explicit torch==2.10.0 install from the PyTorch CPU index (same pattern
as danielhanchen/unsloth-staging-2/.github/workflows/mlx-ci.yml).

Linux + Windows: unsloth_zoo/__init__.py:198 has a find_spec("unsloth")
hard gate that fires before UNSLOTH_IS_PRESENT is read. Install
unsloth --no-deps from git main so the import survives without dragging
in unsloth's heavy CUDA-only deps. Mirror of upstream
consolidated-tests-ci.yml.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants