Skip to content

[staging] e2e Studio q2_k_l GGUF export validation (Linux / macOS / Windows)#127

Closed
danielhanchen wants to merge 3 commits into
mainfrom
studio-export-q2kl-fix
Closed

[staging] e2e Studio q2_k_l GGUF export validation (Linux / macOS / Windows)#127
danielhanchen wants to merge 3 commits into
mainfrom
studio-export-q2kl-fix

Conversation

@danielhanchen

Copy link
Copy Markdown
Owner

Throwaway staging PR validating the Studio q2_k_l GGUF export fix end-to-end on real Linux / macOS / Windows runners. Do not merge — close after CI confirms green.

Pairs with unsloth-zoo-staging-1#18 (already green on all 3 OS for the unit-test layer); this PR validates the binary-toolchain layer:

  1. Build llama.cpp from source on each OS via the patched install_llama_cpp().
  2. Confirm the freshly-built llama-quantize --help advertises q2_k AND the two preset flags (--output-tensor-type, --token-embedding-type) — proving the CLI surface still matches what unsloth_zoo emits.
  3. Import the patched unsloth_zoo from the staging-1 branch, capture the command quantize_gguf(quant_type='q2_k_l') would emit, assert the post-expansion command is correct.

Diagnostic regression check for: main: invalid ftype 'q2_k_l' on Apple Silicon.

Changes on this branch

  • + .github/workflows/studio-export-q2kl-e2e.yml — 3-OS matrix, max-parallel: 3 to stay under the 5-Windows-runner cap, concurrency.cancel-in-progress: true, paths: filter so unrelated commits don't re-fire it.
  • 22 unrelated workflow files (this branch only) — keeps push fan-out bounded. Branch retains lint-ci.yml + wheel-smoke.yml + the new e2e workflow.

Test plan

  • ubuntu-latest job green
  • macos-14 job green
  • windows-latest job green

Adds .github/workflows/studio-export-q2kl-e2e.yml that:

  1. Builds llama.cpp from source on each OS via install_llama_cpp()
     (apt-installed build deps on Linux; cmake + system toolchain on
     macOS / Windows). CUDA spoof preamble matches the existing
     consolidated-tests-ci.yml llama-cpp-smoke job.
  2. Locates the freshly-built llama-quantize binary (build/bin/Release/
     on Windows, top-level on macOS / Linux).
  3. Asserts llama-quantize --help advertises q2_k AND the two preset
     flags (--output-tensor-type, --token-embedding-type) so the
     CLI surface still matches what unsloth_zoo emits on this OS.
  4. Imports the patched unsloth_zoo from the staging fork branch
     (danielhanchen/unsloth-zoo-staging-1@studio-export-q2kl-fix),
     captures the command quantize_gguf emits for quant_type='q2_k_l',
     and asserts the post-expansion tokens are present and the literal
     preset name does NOT leak through.

This is the diagnostic regression check for the user-reported error
   main: invalid ftype 'q2_k_l'
on Apple Silicon Studio export.

Trims 22 unrelated workflow files on this branch only (leaving
lint-ci.yml + wheel-smoke.yml + the new e2e workflow). Drop is staging-
only hygiene -- not intended for upstream unslothai/unsloth.
@gemini-code-assist

Copy link
Copy Markdown

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@danielhanchen

Copy link
Copy Markdown
Owner Author

Throwaway dry-run, validated end-to-end on Linux / macOS / Windows runners. Real fix is at unslothai/unsloth-zoo#667. Closing.

@danielhanchen danielhanchen deleted the studio-export-q2kl-fix branch May 19, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant