Skip to content

Fix Studio q2_k_l GGUF export and new llama.cpp converter package layout#667

Merged
danielhanchen merged 4 commits into
mainfrom
fix-studio-q2-k-l-and-converter-package
May 19, 2026
Merged

Fix Studio q2_k_l GGUF export and new llama.cpp converter package layout#667
danielhanchen merged 4 commits into
mainfrom
fix-studio-q2-k-l-and-converter-package

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Two distinct Studio export bugs surfaced during macOS / Apple Silicon training support testing. Both land in unsloth_zoo.llama_cpp so every caller (CUDA save path, MLX / Studio export, notebooks) shares one path.

Bug 1: q2_k_l preset crashes the MLX export path

q2_k_l is an Unsloth preset, not a native llama.cpp ftype. It maps to:

llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0 IN OUT q2_k NTHREADS

The CUDA path expanded it via _quantize_q2_k_l in unsloth.save. The MLX path in unsloth_zoo.mlx.utils.save_pretrained_gguf forwarded the literal string and llama-quantize aborted on Apple Silicon with:

main: invalid ftype 'q2_k_l'

Fix: quantize_gguf now recognises the preset and expands it before building the command. Preset-only: triggers only on quant_type.strip().lower() == "q2_k_l". Every other ftype, including the real llama.cpp q3_k_l, takes the existing code path byte-for-byte unchanged. shell=True + _quote + IS_WINDOWS path translation preserved. Display log and error messages keep the preset name (q2_k_l), not the internal rewrite (q2_k).

Bug 2: patcher drift on the new conversion/ package layout

Upstream llama.cpp split convert_hf_to_gguf.py into a thin entrypoint plus a conversion/ package (conversion/base.py, conversion/qwen.py). Three regex patches in _download_convert_hf_to_gguf still targeted the old monolithic file and silently no-op'd:

  • Unsloth: No supported architectures (TEXT or VISION) could be determined from the original script.
    ModelBase._model_classes is empty because per-arch modules register lazily via load_all_models().
  • Unsloth: Metadata branding patch target 'self.metadata = gguf.Metadata.load(...)' not found.
    That line moved to conversion/base.py, so produced GGUFs lost quantized_by="Unsloth", repo_url, and tags.
  • Unsloth: Qwen2MoE num_experts patch target not found.
    conversion/qwen.py already calls self.find_hparam(["num_local_experts", "num_experts"]) so the legacy patch is obsolete.

Fix: _detect_converter_layout distinguishes monolith vs package structurally (entrypoint contains from conversion import and both conversion/__init__.py + conversion/base.py exist on disk). On the new layout:

  • Arch allowlist is AST-extracted from conversion/__init__.py:TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the static dicts that already enumerate every supported arch. Cheaper and safer than calling load_all_models() which imports every per-arch module.
  • Branding patch is applied in place to conversion/base.py with a one-line idempotency marker # UNSLOTH_BRANDING_APPLIED.
  • Qwen patch is skipped with an info log when both expert-key literals are already present in conversion/qwen.py.

Old monolithic layout: every existing patch runs unchanged.

Cache-key extension: _download_convert_hf_to_gguf folds conversion/{__init__,base,qwen}.py mtimes and sizes into the @lru_cache(1) key, so re-pulled llama.cpp checkouts re-run the patcher.

Public return shape (patched_filename, text_archs, vision_archs) preserved; callers in unsloth.save and unsloth_zoo.mlx.utils stay source-compatible.

Cross-OS safety

  • q2_k_l branch fires only on that exact preset. q3_k_l (a real llama.cpp ftype) and every other ftype hit the original code path unchanged.
  • Windows shell=True + _quote + build/bin/Release path translation untouched.
  • On old monolithic llama.cpp checkouts the layout flag stays "monolith" and behaviour is byte-for-byte identical to before.

Test plan

tests/test_quantize_gguf_q2_k_l.py (5 tests):

  • q2_k_l expands to q2_k + both q8_0 tensor-type flags
  • case-insensitive (Q2_K_L / q2_K_L / padded whitespace)
  • 17 other ftypes (q2_k, q3_k_s/m/l, q4_, q5_, q6_k, q8_0, bf16, f16, f32) untouched, including the real q3_k_l (Linux + Windows non-regression gate)
  • print_output logs both the request and the expansion
  • error messages keep the preset name, not the rewrite

tests/test_convert_hf_to_gguf_patcher.py (18 tests):

  • synthetic package + monolith fixtures: detection, AST extraction, branding-patch idempotency, lines around the target preserved, Qwen alias detection, cache-key invalidation
  • live upstream fetch of current llama.cpp master from raw.githubusercontent.com: layout=package, branding patch applies, Qwen aliases already handled, TEXT_MODEL_MAP contains LlamaForCausalLM and Qwen*. Skipped cleanly on offline / rate-limited runners.

CI: .github/workflows/studio-export-fix-ci.yml runs the unit tests on ubuntu-latest, macos-14, and windows-latest. paths: filter so unrelated PRs do not re-fire it. max-parallel: 3 so Windows stays under the 5-concurrent-runner cap when this job runs alongside others. Pure-Python; each job finishes in well under 2 minutes.

Validated end-to-end on all three OSes via staging-fork PRs (unsloth-zoo-staging-1#18 and unsloth-staging-2#127) before opening this one.

Checklist

  • q2_k_l unit tests green on ubuntu-latest, macos-14, windows-latest
  • patcher tests green on ubuntu-latest, macos-14, windows-latest
  • live-upstream patcher tests green against current llama.cpp master
  • CUDA path in unsloth.save left untouched (byte-for-byte identical behaviour for non-q2_k_l ftypes)
  • MLX / Studio export path now works for q2_k_l on Apple Silicon

Two distinct Studio export bugs surfaced during macOS / Apple Silicon
training support testing. Both land in unsloth_zoo.llama_cpp so every
caller (CUDA save path, MLX / Studio export, notebooks) shares one path.

Bug 1: q2_k_l preset crashes the MLX export path
  q2_k_l is an Unsloth preset that maps to
      llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0
                     IN OUT q2_k NTHREADS
  not a native llama.cpp ftype. The CUDA path expanded it via
  _quantize_q2_k_l in unsloth.save; the MLX path in
  unsloth_zoo.mlx.utils.save_pretrained_gguf forwarded the literal
  string and llama-quantize aborted with
      main: invalid ftype 'q2_k_l'
  on Apple Silicon.

  Fix: quantize_gguf now recognises the preset and expands it before
  building the command. Preset-only: triggers only on
  quant_type.strip().lower() == "q2_k_l". Every other ftype, including
  the real llama.cpp q3_k_l, takes the existing code path byte-for-byte
  unchanged. shell=True + _quote + IS_WINDOWS path-translation
  preserved. The display log + error messages keep the preset name
  (q2_k_l), not the internal rewrite (q2_k).

Bug 2: patcher drift on the new conversion/ package layout
  Upstream llama.cpp split convert_hf_to_gguf.py into a thin entrypoint
  plus a conversion/ package (conversion/base.py, conversion/qwen.py).
  Three regex patches in _download_convert_hf_to_gguf still targeted
  the old monolithic file and silently no-op'd:
    - "No supported architectures (TEXT or VISION) could be determined"
      ModelBase._model_classes is empty because per-arch modules
      register lazily via load_all_models().
    - "Metadata branding patch target not found"
      self.metadata = gguf.Metadata.load(...) moved to
      conversion/base.py, so produced GGUFs no longer carried
      quantized_by="Unsloth", repo_url, and tags.
    - "Qwen2MoE num_experts patch target not found"
      conversion/qwen.py already calls
      self.find_hparam(["num_local_experts", "num_experts"]) so the
      legacy patch is obsolete.

  Fix: _detect_converter_layout distinguishes monolith vs package
  structurally (entrypoint contains `from conversion import` and
  conversion/__init__.py + conversion/base.py exist on disk). On the
  new layout:
    - Arch allowlist is AST-extracted from
      conversion/__init__.py:TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the
      static dicts that already enumerate every supported arch. Cheaper
      and safer than calling load_all_models() which imports every
      per-arch module.
    - Branding patch is applied in place to conversion/base.py with a
      one-line idempotency marker (# UNSLOTH_BRANDING_APPLIED).
    - Qwen patch is skipped with an info log when both expert-key
      literals are already present in conversion/qwen.py.
  Old monolithic layout: every existing patch runs unchanged.

  Cache-key extension: _download_convert_hf_to_gguf folds
  conversion/{__init__,base,qwen}.py mtimes + sizes into the
  @lru_cache(1) key so a re-pulled llama.cpp checkout re-runs the
  patcher.

  Public return shape (patched_filename, text_archs, vision_archs)
  preserved; callers in unsloth.save and unsloth_zoo.mlx.utils stay
  source-compatible.

Cross-OS safety:
  - q2_k_l branch fires only on that exact preset; q3_k_l (real
    llama.cpp ftype) and every other ftype hit the original code path
    unchanged.
  - Windows shell=True + _quote + build/bin/Release path translation
    untouched.
  - On old monolithic llama.cpp checkouts the layout flag stays
    "monolith" and behaviour is byte-for-byte identical to before.

Tests
  tests/test_quantize_gguf_q2_k_l.py (5 tests):
    - q2_k_l expands to q2_k + both q8_0 tensor-type flags
    - case-insensitive (Q2_K_L / q2_K_L / padded whitespace)
    - 17 other ftypes (q2_k, q3_k_s/m/l, q4_*, q5_*, q6_k, q8_0, bf16,
      f16, f32) untouched -- the Linux + Windows non-regression gate
    - print_output logs both the request and the expansion
    - error messages keep the preset name, not the rewrite

  tests/test_convert_hf_to_gguf_patcher.py (18 tests):
    - synthetic package + monolith fixtures: detection, AST extraction,
      branding-patch idempotency, lines around the target preserved,
      Qwen alias detection, cache-key invalidation
    - live upstream fetch of current llama.cpp master from
      raw.githubusercontent.com: layout=package, branding patch
      applies, Qwen aliases already handled, TEXT_MODEL_MAP contains
      LlamaForCausalLM + Qwen*. Skipped cleanly on offline / rate-
      limited runners.

CI
  .github/workflows/studio-export-fix-ci.yml runs the unit tests on
  ubuntu-latest, macos-14, and windows-latest. paths: filter so
  unrelated PRs do not re-fire it. max-parallel: 3 so Windows stays
  under the 5-concurrent-runner cap when this job runs alongside
  others. Pure-Python; each job finishes in well under 2 minutes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6fc286a1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py Outdated
# load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
# MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
# layout keeps the original _model_classes introspection.
_layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect the package layout beside the selected converter

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points at a checkout with the new conversion/ package (and the Studio flow similarly installs/checks ./llama.cpp before calling this helper), this still inspects LLAMA_CPP_DEFAULT_DIR instead of the directory that contains the converter being patched. If ~/.unsloth/llama.cpp does not contain the matching conversion/__init__.py and conversion/base.py, the package entrypoint is treated as monolith (or its imports fail during the earlier temp-module load), so the new arch enumeration and branding patches do not apply and the generated converter can import a package that is not present. Resolve the package directory from the selected local converter/install rather than always using the default path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in b3b2766. Added a single helper _get_llama_cpp_dir(local_script_info) that returns os.path.dirname(local_script_info[0]) when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set, LLAMA_CPP_DEFAULT_DIR otherwise. Threaded the result through layout detection, AST extraction of TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the branding patch on conversion/base.py, the Qwen alias check on conversion/qwen.py, and the sibling-info cache key.

Added three tests under tests/test_convert_hf_to_gguf_patcher.py:

  • test_llama_cpp_dir_defaults_when_no_local_script — default fallback returns LLAMA_CPP_DEFAULT_DIR.
  • test_llama_cpp_dir_resolves_to_source_dir_when_local_script_set — synthetic 3-tuple override returns the source dir.
  • test_patcher_anchors_on_custom_dir_when_override_set — full custom-tree round-trip: layout, sibling info, AST extraction all anchor on the override dir.

No behaviour change when the env var is unset (the common path).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the llama.cpp conversion script patcher to accommodate the transition from a monolithic script to a package-based layout. It introduces AST-based parsing for architecture extraction and implements layout-aware branding and Qwen-specific patches. Additionally, it adds support for the Q2_K_L quantization preset. The reviewer feedback identifies an opportunity to refactor duplicated directory resolution logic into a shared utility and points out a redundant newline in the branding patch implementation.

Comment thread unsloth_zoo/llama_cpp.py
Comment on lines +1052 to +1058
# Resolve env vars + sibling mtimes on every call; both are folded into
# the @lru_cache key so re-pulled llama.cpp checkouts re-run the patcher.
return _download_convert_hf_to_gguf_cached(
name,
_resolve_local_convert_script(),
_conversion_sibling_info(LLAMA_CPP_DEFAULT_DIR),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for determining the llama.cpp directory is repeated multiple times. Refactor this into a shared helper function like _get_llama_cpp_dir to improve maintainability, as per the repository's practice for system-specific paths.

Suggested change
# Resolve env vars + sibling mtimes on every call; both are folded into
# the @lru_cache key so re-pulled llama.cpp checkouts re-run the patcher.
return _download_convert_hf_to_gguf_cached(
name,
_resolve_local_convert_script(),
_conversion_sibling_info(LLAMA_CPP_DEFAULT_DIR),
)
_local_script_info = _resolve_local_convert_script()
_llama_cpp_dir = _get_llama_cpp_dir(_local_script_info)
return _download_convert_hf_to_gguf_cached(
name,
_local_script_info,
_conversion_sibling_info(_llama_cpp_dir),
)
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b3b2766. added; threaded through the wrapper, layout detection, AST extraction, branding patch, Qwen check, and sibling-info cache key. The other five suggestions in this review covering the same refactor are addressed by the same commit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b3b2766. _get_llama_cpp_dir(local_script_info) added; threaded through the wrapper, layout detection, AST extraction, branding patch, Qwen check, and sibling-info cache key. The other five suggestions in this review covering the same refactor are addressed by the same commit.

Comment thread unsloth_zoo/llama_cpp.py
temp_original_file_path = None # Initialize for finally block
# Set by introspection; read by Patch 2 + Patch 3 below. Default to
# 'monolith' so a failed introspection still drives the legacy patches.
_layout = "monolith"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use the shared helper function _get_llama_cpp_dir to derive the base llama.cpp directory from the local script path.

Suggested change
_layout = "monolith"
_layout = "monolith"
_llama_cpp_dir = _get_llama_cpp_dir(_local_script_info)
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Comment thread unsloth_zoo/llama_cpp.py Outdated
# load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
# MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
# layout keeps the original _model_classes introspection.
_layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Layout detection should use the directory derived from the shared helper function to ensure consistency.

Suggested change
_layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)
_layout = _detect_converter_layout(original_content, _llama_cpp_dir)
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Comment thread unsloth_zoo/llama_cpp.py Outdated
logger.info(f"Unsloth: convert_hf_to_gguf layout detected: {_layout}")

if _layout == "package":
conv_init_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "__init__.py")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use the derived _llama_cpp_dir to locate the conversion package.

Suggested change
conv_init_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "__init__.py")
conv_init_py = os.path.join(_llama_cpp_dir, "conversion", "__init__.py")
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Comment thread unsloth_zoo/llama_cpp.py Outdated
if re.search(rb"self\.metadata \= gguf\.Metadata\.load\(", patched_content): logger.warning("Unsloth: Metadata branding patch target found, but regex failed to apply.")
else: logger.warning("Unsloth: Metadata branding patch target 'self.metadata = gguf.Metadata.load(...)' not found.")
if _layout == "package":
conv_base_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "base.py")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use the derived _llama_cpp_dir to locate base.py for patching.

Suggested change
conv_base_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "base.py")
conv_base_py = os.path.join(_llama_cpp_dir, "conversion", "base.py")
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Comment thread unsloth_zoo/llama_cpp.py Outdated
)
_qwen_handled = False
if _layout == "package":
conv_qwen_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "qwen.py")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use the derived _llama_cpp_dir to locate qwen.py for alias checking.

Suggested change
conv_qwen_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "qwen.py")
conv_qwen_py = os.path.join(_llama_cpp_dir, "conversion", "qwen.py")
References
  1. Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Comment thread unsloth_zoo/llama_cpp.py Outdated
Comment on lines +1019 to +1021
+ indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']\n"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation adds an extra newline before the original next line because suffix already starts with a newline and an additional \n is appended to the last branding line. Removing the trailing \n from the last branding line will result in cleaner file content.

Suggested change
+ indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']\n"
+ indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"
+ indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3a9a23c. Dropped the trailing \n on the last branding line so the captured suffix (which already starts with newline + indent) doesn't produce an extra blank line. Verified with the existing test_branding_patch_preserves_lines_around_target test and the live-upstream round-trip.

Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Comment thread tests/test_convert_hf_to_gguf_patcher.py Fixed
Comment thread tests/test_convert_hf_to_gguf_patcher.py Fixed
Comment thread unsloth_zoo/llama_cpp.py Fixed
Daniel Han added 2 commits May 19, 2026 07:49
PR #667 review (Codex + Gemini bots): when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR
points at a checkout with the new conversion/ package, the layout
detection, AST extraction, branding patch, Qwen check, and sibling-info
cache key should all resolve against THAT dir, not the hard-coded
default. The previous code always inspected LLAMA_CPP_DEFAULT_DIR, which
silently mis-detects the layout when the override dir differs from the
default install.

Introduce a single helper _get_llama_cpp_dir(local_script_info) that
returns os.path.dirname(local_script_info[0]) when the env var is set,
LLAMA_CPP_DEFAULT_DIR otherwise. Thread the result through:
  - _download_convert_hf_to_gguf wrapper (sibling-info cache key)
  - _detect_converter_layout call site
  - conversion/__init__.py AST extraction
  - conversion/base.py branding patch target
  - conversion/qwen.py expert-alias check

Adds three tests:
  - default fallback returns LLAMA_CPP_DEFAULT_DIR
  - synthetic 3-tuple override returns the source dir
  - full custom-tree round-trip: layout, sibling info, AST extraction
    all anchor on the override dir

No behaviour change when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset (the
common path). When set, the patcher now correctly targets the user's
checkout instead of falling through to monolith and no-op'ing.
Three small follow-ups from PR #667 review:

  - Branding insertion: drop the trailing newline from the last branding
    line. The match group `suffix` already starts with a newline + indent
    (it captured the boundary into the line that follows Metadata.load),
    so the trailing \n on the last branding line produced a blank line
    between the branding block and the next statement. Aesthetic only;
    Python parses both forms identically.
  - Add a one-line explanatory comment inside `_detect_converter_layout`'s
    bare `except Exception:`. Behaviour unchanged.
  - Remove unused `logging` and `textwrap` imports from
    tests/test_convert_hf_to_gguf_patcher.py.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a9a23c7e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py
# load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
# MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
# layout keeps the original _model_classes introspection.
_layout = _detect_converter_layout(original_content, _llama_cpp_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move package detection before importing local converters

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points to a new package-layout checkout outside LLAMA_CPP_DEFAULT_DIR, this layout detection runs only after _load_module_from_path() has already imported a temp copy from LLAMA_CPP_DEFAULT_DIR (lines 1135-1150). If that default directory does not also contain the matching conversion/ package, the entrypoint's from conversion import ... raises ModuleNotFoundError, so the AST-based arch parsing and branding patch are never reached. Fresh evidence in the current revision is that _llama_cpp_dir is now resolved from the selected converter, but the import still happens before this newly added package branch can use it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0a3256e. Hoisted _detect_converter_layout above the temp-write + _load_module_from_path step. On the package layout we now skip the module-load entirely and go straight to AST-parsing TEXT_MODEL_MAP / MMPROJ_MODEL_MAP from the resolved converter directory. The original monolith path is preserved verbatim, just nested in the else branch.

original_module_name is initialized to None at the top of the function so the cleanup if original_module_name is not None and original_module_name in sys.modules stays safe on the package branch.

Added test_package_layout_does_not_require_module_import (tests/test_convert_hf_to_gguf_patcher.py): it monkeypatches _load_module_from_path with a sentinel that raises AssertionError if called, builds a synthetic package-layout tree under UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, drives the patcher end-to-end, and asserts the sentinel never fired, base.py was branded in place under the override dir, and LlamaForCausalLM made it into text_archs. 27/27 local tests still pass.

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points at a checkout outside
LLAMA_CPP_DEFAULT_DIR, the previous order would temp-write the
entrypoint inside the default dir and try to import it; on the new
package layout, `from conversion import ...` then resolves against
the default dir's missing conversion/ package and raises
ModuleNotFoundError before AST arch extraction or branding can run.

Hoist _detect_converter_layout above the temp-write/module-load
step. On "package" layout, skip the module load entirely and go
straight to AST-parsing TEXT_MODEL_MAP / MMPROJ_MODEL_MAP from the
selected dir. Monolith path keeps the original behaviour.

Added test_package_layout_does_not_require_module_import: a
sentinel replaces _load_module_from_path and asserts it is NOT
called on the package-layout path.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a3256e80e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py
Comment on lines +1273 to +1275
if _layout == "package":
conv_base_py = os.path.join(_llama_cpp_dir, "conversion", "base.py")
_branding_status = _apply_branding_patch_to_base(conv_base_py)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Save package-layout converters beside their package

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points to a package-layout checkout outside LLAMA_CPP_DEFAULT_DIR, this branch patches conversion/base.py in the override tree, but the generated entrypoint is still written under LLAMA_CPP_DEFAULT_DIR later. Running that generated script resolves from conversion import ... relative to its saved location, so installs without a matching LLAMA_CPP_DEFAULT_DIR/conversion fail with ModuleNotFoundError (or import an unrelated package). Fresh evidence in the current revision is that the package branch still targets _llama_cpp_dir here while the patched entrypoint is saved elsewhere.

Useful? React with 👍 / 👎.

Comment thread unsloth_zoo/llama_cpp.py
original_module_name = None # Only set on the monolith branch
# Set by introspection; read by Patch 2 + Patch 3 below. Default to
# 'monolith' so a failed introspection still drives the legacy patches.
_layout = "monolith"
Comment thread unsloth_zoo/llama_cpp.py
if local_script_info is not None:
return os.path.dirname(local_script_info[0])
return LLAMA_CPP_DEFAULT_DIR
pass
Comment thread unsloth_zoo/llama_cpp.py
_stat(base_py),
_stat(qwen_py) if os.path.isfile(qwen_py) else None,
)
pass
Comment thread unsloth_zoo/llama_cpp.py
# to monolith so the legacy regex patches still run.
pass
return "monolith"
pass
Comment thread unsloth_zoo/llama_cpp.py
if isinstance(node.target, ast.Name) and node.target.id == dict_name:
_harvest(node.value)
return keys
pass
Comment thread unsloth_zoo/llama_cpp.py
except OSError:
return "pattern-missing"
return "applied"
pass
Comment thread unsloth_zoo/llama_cpp.py
except OSError:
return False
return (b"num_local_experts" in content) and (b"num_experts" in content)
pass
@danielhanchen danielhanchen merged commit dda8dc7 into main May 19, 2026
18 checks passed
@danielhanchen danielhanchen deleted the fix-studio-q2-k-l-and-converter-package branch May 19, 2026 10:13
danielhanchen added a commit that referenced this pull request May 19, 2026
…#677)

* Fix Q2_K_L recipe: bumps were too aggressive, output was garbage

The recipe shipped in #667 expanded q2_k_l to:
  llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0 IN OUT q2_k N
which produced garbage output. The correct Unsloth Q2_K_L recipe is:
  llama-quantize \
    --tensor-type ffn_down=Q3_K \
    --output-tensor-type Q6_K \
    --token-embedding-type Q4_K \
    IN OUT q2_k N

ffn_down=Q3_K relies on llama-quantize's substring matching to cover
both ffn_down.weight (dense FFN) and ffn_down_exps.weight (MoE experts)
across every layer id with one flag.

Tests updated to match the new recipe; added a coverage test
documenting that the bare "ffn_down" pattern intentionally hits both
dense + MoE down-projection tensors so the override is not split into
per-layer enumerations later.

* Q2_K_L: chain anchored regex for ffn_down_exps + ffn_down

llama-quantize matches --tensor-type with std::regex_search (not
substring) and iterates first-match-wins, so we switch from one bare
"ffn_down" pattern to two anchored regex patterns:

  --tensor-type "\.ffn_down_exps=Q3_K"
  --tensor-type "\.ffn_down=Q3_K"

The leading \. anchors on the GGUF path-separator dot, ruling out
accidental hits on hypothetical tensor names that merely contain
"ffn_down" as a substring (e.g. attn_ffn_downsample). The more-specific
MoE pattern is listed first because the loop in src/llama-quant.cpp
breaks on first match -- otherwise the dense regex would absorb every
MoE expert by prefix and the explicit override would be dead code.

Tests updated: assert both patterns are present, assert MoE pattern
precedes dense pattern, drop the substring-matching test in favour of
an order-invariant test that captures the new contract.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
@danielhanchen

Copy link
Copy Markdown
Member Author

Re-reviewed end-to-end after #677. Splitting the verdict because the two pieces of this PR landed with very different correctness:

1. Converter-package-layout patcher + tests + CI workflow — clean and unchanged on main.

  • tests/test_convert_hf_to_gguf_patcher.py is byte-identical between pr-667 and main (git diff main..pr-667 -- tests/test_convert_hf_to_gguf_patcher.py returns 0 lines).
  • .github/workflows/studio-export-fix-ci.yml is byte-identical between pr-667 and main and still present on main — git -C unsloth-zoo show main:.github/workflows/studio-export-fix-ci.yml matches the PR version.
  • The new-layout detection (TEXT_MODEL_MAP / MMPROJ_MODEL_MAP AST extraction, conversion/base.py idempotent patching, sibling-stat cache invalidation, monolith back-compat) survived #677 unchanged.
  • Tests on main: 28 passed in 0.83s (pytest tests/test_convert_hf_to_gguf_patcher.py tests/test_quantize_gguf_q2_k_l.py -q). The +1 vs pr-667 is test_q2_k_l_ffn_down_pattern_order_is_specific_first added by Fix Q2_K_L recipe: q2_k base + ffn_down=Q3_K, output=Q6_K, embed=Q4_K #677.

2. q2_k_l recipe — shipped wrong, fixed by #677.

On pr-667 (unsloth_zoo/llama_cpp.py:1805-1807):

_extra_flags = "--output-tensor-type q8_0 --token-embedding-type q8_0 "
quant_type = "q2_k"

Expansion gave q2_k base with output=q8_0, embed=q8_0 and no ffn_down override — bloats file size without addressing the dominant inference-quality lever (ffn_down at Q2_K is the main degradation source on a q2_k base).

On main after #677 (unsloth_zoo/llama_cpp.py:1811-1818):

_extra_flags = (
    '--tensor-type "\\.ffn_down_exps=Q3_K" '
    '--tensor-type "\\.ffn_down=Q3_K" '
    '--output-tensor-type Q6_K '
    '--token-embedding-type Q4_K '
)
quant_type = "q2_k"

Output q8_0 → Q6_K, embed q8_0 → Q4_K, plus two --tensor-type regex overrides for ffn_down / ffn_down_exps → Q3_K (both dense and MoE), chained specific-first because llama-quantize iterates first-match-wins (std::regex_search in src/llama-quant.cpp). The leading \. anchors on the GGUF dot separator so the override cannot leak into unrelated tensor names.

Net: the converter-layout half was a clean, regression-free fix and is still useful on main. The q2_k_l half shipped broken and required a same-week followup (#677). The takeaway for next time: split a quant-recipe change from a converter-package-layout change so a recipe regression can be reverted independently, and gate the recipe behind a real quant-quality probe rather than a recipe-shape unit test.

Cross-OS validation for similar Studio/MLX backend tests is pinned in danielhanchen/unsloth-staging-2#137 for the parent repo's cohort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant