Fix Studio q2_k_l GGUF export and new llama.cpp converter package layout by danielhanchen · Pull Request #667 · unslothai/unsloth-zoo

danielhanchen · 2026-05-18T11:16:55Z

Summary

Two distinct Studio export bugs surfaced during macOS / Apple Silicon training support testing. Both land in unsloth_zoo.llama_cpp so every caller (CUDA save path, MLX / Studio export, notebooks) shares one path.

Bug 1: q2_k_l preset crashes the MLX export path

q2_k_l is an Unsloth preset, not a native llama.cpp ftype. It maps to:

llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0 IN OUT q2_k NTHREADS

The CUDA path expanded it via _quantize_q2_k_l in unsloth.save. The MLX path in unsloth_zoo.mlx.utils.save_pretrained_gguf forwarded the literal string and llama-quantize aborted on Apple Silicon with:

main: invalid ftype 'q2_k_l'

Fix: quantize_gguf now recognises the preset and expands it before building the command. Preset-only: triggers only on quant_type.strip().lower() == "q2_k_l". Every other ftype, including the real llama.cpp q3_k_l, takes the existing code path byte-for-byte unchanged. shell=True + _quote + IS_WINDOWS path translation preserved. Display log and error messages keep the preset name (q2_k_l), not the internal rewrite (q2_k).

Bug 2: patcher drift on the new conversion/ package layout

Upstream llama.cpp split convert_hf_to_gguf.py into a thin entrypoint plus a conversion/ package (conversion/base.py, conversion/qwen.py). Three regex patches in _download_convert_hf_to_gguf still targeted the old monolithic file and silently no-op'd:

Unsloth: No supported architectures (TEXT or VISION) could be determined from the original script.
ModelBase._model_classes is empty because per-arch modules register lazily via load_all_models().
Unsloth: Metadata branding patch target 'self.metadata = gguf.Metadata.load(...)' not found.
That line moved to conversion/base.py, so produced GGUFs lost quantized_by="Unsloth", repo_url, and tags.
Unsloth: Qwen2MoE num_experts patch target not found.
conversion/qwen.py already calls self.find_hparam(["num_local_experts", "num_experts"]) so the legacy patch is obsolete.

Fix: _detect_converter_layout distinguishes monolith vs package structurally (entrypoint contains from conversion import and both conversion/__init__.py + conversion/base.py exist on disk). On the new layout:

Arch allowlist is AST-extracted from conversion/__init__.py:TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the static dicts that already enumerate every supported arch. Cheaper and safer than calling load_all_models() which imports every per-arch module.
Branding patch is applied in place to conversion/base.py with a one-line idempotency marker # UNSLOTH_BRANDING_APPLIED.
Qwen patch is skipped with an info log when both expert-key literals are already present in conversion/qwen.py.

Old monolithic layout: every existing patch runs unchanged.

Cache-key extension: _download_convert_hf_to_gguf folds conversion/{__init__,base,qwen}.py mtimes and sizes into the @lru_cache(1) key, so re-pulled llama.cpp checkouts re-run the patcher.

Public return shape (patched_filename, text_archs, vision_archs) preserved; callers in unsloth.save and unsloth_zoo.mlx.utils stay source-compatible.

Cross-OS safety

q2_k_l branch fires only on that exact preset. q3_k_l (a real llama.cpp ftype) and every other ftype hit the original code path unchanged.
Windows shell=True + _quote + build/bin/Release path translation untouched.
On old monolithic llama.cpp checkouts the layout flag stays "monolith" and behaviour is byte-for-byte identical to before.

Test plan

tests/test_quantize_gguf_q2_k_l.py (5 tests):

q2_k_l expands to q2_k + both q8_0 tensor-type flags
case-insensitive (Q2_K_L / q2_K_L / padded whitespace)
17 other ftypes (q2_k, q3_k_s/m/l, q4_, q5_, q6_k, q8_0, bf16, f16, f32) untouched, including the real q3_k_l (Linux + Windows non-regression gate)
print_output logs both the request and the expansion
error messages keep the preset name, not the rewrite

tests/test_convert_hf_to_gguf_patcher.py (18 tests):

synthetic package + monolith fixtures: detection, AST extraction, branding-patch idempotency, lines around the target preserved, Qwen alias detection, cache-key invalidation
live upstream fetch of current llama.cpp master from raw.githubusercontent.com: layout=package, branding patch applies, Qwen aliases already handled, TEXT_MODEL_MAP contains LlamaForCausalLM and Qwen*. Skipped cleanly on offline / rate-limited runners.

CI: .github/workflows/studio-export-fix-ci.yml runs the unit tests on ubuntu-latest, macos-14, and windows-latest. paths: filter so unrelated PRs do not re-fire it. max-parallel: 3 so Windows stays under the 5-concurrent-runner cap when this job runs alongside others. Pure-Python; each job finishes in well under 2 minutes.

Validated end-to-end on all three OSes via staging-fork PRs (unsloth-zoo-staging-1#18 and unsloth-staging-2#127) before opening this one.

Checklist

q2_k_l unit tests green on ubuntu-latest, macos-14, windows-latest
patcher tests green on ubuntu-latest, macos-14, windows-latest
live-upstream patcher tests green against current llama.cpp master
CUDA path in unsloth.save left untouched (byte-for-byte identical behaviour for non-q2_k_l ftypes)
MLX / Studio export path now works for q2_k_l on Apple Silicon

Two distinct Studio export bugs surfaced during macOS / Apple Silicon training support testing. Both land in unsloth_zoo.llama_cpp so every caller (CUDA save path, MLX / Studio export, notebooks) shares one path. Bug 1: q2_k_l preset crashes the MLX export path q2_k_l is an Unsloth preset that maps to llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0 IN OUT q2_k NTHREADS not a native llama.cpp ftype. The CUDA path expanded it via _quantize_q2_k_l in unsloth.save; the MLX path in unsloth_zoo.mlx.utils.save_pretrained_gguf forwarded the literal string and llama-quantize aborted with main: invalid ftype 'q2_k_l' on Apple Silicon. Fix: quantize_gguf now recognises the preset and expands it before building the command. Preset-only: triggers only on quant_type.strip().lower() == "q2_k_l". Every other ftype, including the real llama.cpp q3_k_l, takes the existing code path byte-for-byte unchanged. shell=True + _quote + IS_WINDOWS path-translation preserved. The display log + error messages keep the preset name (q2_k_l), not the internal rewrite (q2_k). Bug 2: patcher drift on the new conversion/ package layout Upstream llama.cpp split convert_hf_to_gguf.py into a thin entrypoint plus a conversion/ package (conversion/base.py, conversion/qwen.py). Three regex patches in _download_convert_hf_to_gguf still targeted the old monolithic file and silently no-op'd: - "No supported architectures (TEXT or VISION) could be determined" ModelBase._model_classes is empty because per-arch modules register lazily via load_all_models(). - "Metadata branding patch target not found" self.metadata = gguf.Metadata.load(...) moved to conversion/base.py, so produced GGUFs no longer carried quantized_by="Unsloth", repo_url, and tags. - "Qwen2MoE num_experts patch target not found" conversion/qwen.py already calls self.find_hparam(["num_local_experts", "num_experts"]) so the legacy patch is obsolete. Fix: _detect_converter_layout distinguishes monolith vs package structurally (entrypoint contains `from conversion import` and conversion/__init__.py + conversion/base.py exist on disk). On the new layout: - Arch allowlist is AST-extracted from conversion/__init__.py:TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the static dicts that already enumerate every supported arch. Cheaper and safer than calling load_all_models() which imports every per-arch module. - Branding patch is applied in place to conversion/base.py with a one-line idempotency marker (# UNSLOTH_BRANDING_APPLIED). - Qwen patch is skipped with an info log when both expert-key literals are already present in conversion/qwen.py. Old monolithic layout: every existing patch runs unchanged. Cache-key extension: _download_convert_hf_to_gguf folds conversion/{__init__,base,qwen}.py mtimes + sizes into the @lru_cache(1) key so a re-pulled llama.cpp checkout re-runs the patcher. Public return shape (patched_filename, text_archs, vision_archs) preserved; callers in unsloth.save and unsloth_zoo.mlx.utils stay source-compatible. Cross-OS safety: - q2_k_l branch fires only on that exact preset; q3_k_l (real llama.cpp ftype) and every other ftype hit the original code path unchanged. - Windows shell=True + _quote + build/bin/Release path translation untouched. - On old monolithic llama.cpp checkouts the layout flag stays "monolith" and behaviour is byte-for-byte identical to before. Tests tests/test_quantize_gguf_q2_k_l.py (5 tests): - q2_k_l expands to q2_k + both q8_0 tensor-type flags - case-insensitive (Q2_K_L / q2_K_L / padded whitespace) - 17 other ftypes (q2_k, q3_k_s/m/l, q4_*, q5_*, q6_k, q8_0, bf16, f16, f32) untouched -- the Linux + Windows non-regression gate - print_output logs both the request and the expansion - error messages keep the preset name, not the rewrite tests/test_convert_hf_to_gguf_patcher.py (18 tests): - synthetic package + monolith fixtures: detection, AST extraction, branding-patch idempotency, lines around the target preserved, Qwen alias detection, cache-key invalidation - live upstream fetch of current llama.cpp master from raw.githubusercontent.com: layout=package, branding patch applies, Qwen aliases already handled, TEXT_MODEL_MAP contains LlamaForCausalLM + Qwen*. Skipped cleanly on offline / rate- limited runners. CI .github/workflows/studio-export-fix-ci.yml runs the unit tests on ubuntu-latest, macos-14, and windows-latest. paths: filter so unrelated PRs do not re-fire it. max-parallel: 3 so Windows stays under the 5-concurrent-runner cap when this job runs alongside others. Pure-Python; each job finishes in well under 2 minutes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6fc286a1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T11:19:31Z

+        # load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
+        # MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
+        # layout keeps the original _model_classes introspection.
+        _layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)


Detect the package layout beside the selected converter

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points at a checkout with the new conversion/ package (and the Studio flow similarly installs/checks ./llama.cpp before calling this helper), this still inspects LLAMA_CPP_DEFAULT_DIR instead of the directory that contains the converter being patched. If ~/.unsloth/llama.cpp does not contain the matching conversion/__init__.py and conversion/base.py, the package entrypoint is treated as monolith (or its imports fail during the earlier temp-module load), so the new arch enumeration and branding patches do not apply and the generated converter can import a package that is not present. Resolve the package directory from the selected local converter/install rather than always using the default path.

Useful? React with 👍 / 👎.

Good catch. Fixed in b3b2766. Added a single helper _get_llama_cpp_dir(local_script_info) that returns os.path.dirname(local_script_info[0]) when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set, LLAMA_CPP_DEFAULT_DIR otherwise. Threaded the result through layout detection, AST extraction of TEXT_MODEL_MAP / MMPROJ_MODEL_MAP, the branding patch on conversion/base.py, the Qwen alias check on conversion/qwen.py, and the sibling-info cache key.

Added three tests under tests/test_convert_hf_to_gguf_patcher.py:

test_llama_cpp_dir_defaults_when_no_local_script — default fallback returns LLAMA_CPP_DEFAULT_DIR.

test_llama_cpp_dir_resolves_to_source_dir_when_local_script_set — synthetic 3-tuple override returns the source dir.

test_patcher_anchors_on_custom_dir_when_override_set — full custom-tree round-trip: layout, sibling info, AST extraction all anchor on the override dir.

No behaviour change when the env var is unset (the common path).

gemini-code-assist

Code Review

This pull request updates the llama.cpp conversion script patcher to accommodate the transition from a monolithic script to a package-based layout. It introduces AST-based parsing for architecture extraction and implements layout-aware branding and Qwen-specific patches. Additionally, it adds support for the Q2_K_L quantization preset. The reviewer feedback identifies an opportunity to refactor duplicated directory resolution logic into a shared utility and points out a redundant newline in the branding patch implementation.

gemini-code-assist · 2026-05-18T11:24:34Z

+    # Resolve env vars + sibling mtimes on every call; both are folded into
+    # the @lru_cache key so re-pulled llama.cpp checkouts re-run the patcher.
+    return _download_convert_hf_to_gguf_cached(
+        name,
+        _resolve_local_convert_script(),
+        _conversion_sibling_info(LLAMA_CPP_DEFAULT_DIR),
+    )


The logic for determining the llama.cpp directory is repeated multiple times. Refactor this into a shared helper function like _get_llama_cpp_dir to improve maintainability, as per the repository's practice for system-specific paths.

Suggested change

# Resolve env vars + sibling mtimes on every call; both are folded into

# the @lru_cache key so re-pulled llama.cpp checkouts re-run the patcher.

return _download_convert_hf_to_gguf_cached(

name,

_resolve_local_convert_script(),

_conversion_sibling_info(LLAMA_CPP_DEFAULT_DIR),

)

_local_script_info = _resolve_local_convert_script()

_llama_cpp_dir = _get_llama_cpp_dir(_local_script_info)

return _download_convert_hf_to_gguf_cached(

name,

_local_script_info,

_conversion_sibling_info(_llama_cpp_dir),

)

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

Done in b3b2766. added; threaded through the wrapper, layout detection, AST extraction, branding patch, Qwen check, and sibling-info cache key. The other five suggestions in this review covering the same refactor are addressed by the same commit.

Done in b3b2766. _get_llama_cpp_dir(local_script_info) added; threaded through the wrapper, layout detection, AST extraction, branding patch, Qwen check, and sibling-info cache key. The other five suggestions in this review covering the same refactor are addressed by the same commit.

gemini-code-assist · 2026-05-18T11:24:34Z

    temp_original_file_path = None # Initialize for finally block
+    # Set by introspection; read by Patch 2 + Patch 3 below. Default to
+    # 'monolith' so a failed introspection still drives the legacy patches.
+    _layout = "monolith"


Use the shared helper function _get_llama_cpp_dir to derive the base llama.cpp directory from the local script path.

Suggested change

_layout = "monolith"

_layout = "monolith"

_llama_cpp_dir = _get_llama_cpp_dir(_local_script_info)

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

gemini-code-assist · 2026-05-18T11:24:34Z

+        # load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
+        # MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
+        # layout keeps the original _model_classes introspection.
+        _layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)


Layout detection should use the directory derived from the shared helper function to ensure consistency.

Suggested change

_layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)

_layout = _detect_converter_layout(original_content, _llama_cpp_dir)

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

gemini-code-assist · 2026-05-18T11:24:34Z

+        logger.info(f"Unsloth: convert_hf_to_gguf layout detected: {_layout}")
+
+        if _layout == "package":
+            conv_init_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "__init__.py")


Use the derived _llama_cpp_dir to locate the conversion package.

Suggested change

conv_init_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "__init__.py")

conv_init_py = os.path.join(_llama_cpp_dir, "conversion", "__init__.py")

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

gemini-code-assist · 2026-05-18T11:24:34Z

-                 if re.search(rb"self\.metadata \= gguf\.Metadata\.load\(", patched_content): logger.warning("Unsloth: Metadata branding patch target found, but regex failed to apply.")
-                 else: logger.warning("Unsloth: Metadata branding patch target 'self.metadata = gguf.Metadata.load(...)' not found.")
+            if _layout == "package":
+                conv_base_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "base.py")


Use the derived _llama_cpp_dir to locate base.py for patching.

Suggested change

conv_base_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "base.py")

conv_base_py = os.path.join(_llama_cpp_dir, "conversion", "base.py")

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

gemini-code-assist · 2026-05-18T11:24:34Z

-            )
+            _qwen_handled = False
+            if _layout == "package":
+                conv_qwen_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "qwen.py")


Use the derived _llama_cpp_dir to locate qwen.py for alias checking.

Suggested change

conv_qwen_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "qwen.py")

conv_qwen_py = os.path.join(_llama_cpp_dir, "conversion", "qwen.py")

References

Refactor duplicated logic for finding system-specific paths into a shared helper function to improve maintainability.

gemini-code-assist · 2026-05-18T11:24:35Z

+            + indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"
+            + indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"
+            + indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']\n"


The current implementation adds an extra newline before the original next line because suffix already starts with a newline and an additional \n is appended to the last branding line. Removing the trailing \n from the last branding line will result in cleaner file content.

Suggested change

+ indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"

+ indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"

+ indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']\n"

+ indent + b"if hasattr(self.metadata, 'quantized_by'): self.metadata.quantized_by = 'Unsloth'\n"

+ indent + b"if hasattr(self.metadata, 'repo_url'): self.metadata.repo_url = 'https://huggingface.co/unsloth'\n"

+ indent + b"if hasattr(self.metadata, 'tags'): self.metadata.tags = ['unsloth', 'llama.cpp']"

Addressed in 3a9a23c. Dropped the trailing \n on the last branding line so the captured suffix (which already starts with newline + indent) doesn't produce an extra blank line. Verified with the existing test_branding_patch_preserves_lines_around_target test and the live-upstream round-trip.

PR #667 review (Codex + Gemini bots): when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points at a checkout with the new conversion/ package, the layout detection, AST extraction, branding patch, Qwen check, and sibling-info cache key should all resolve against THAT dir, not the hard-coded default. The previous code always inspected LLAMA_CPP_DEFAULT_DIR, which silently mis-detects the layout when the override dir differs from the default install. Introduce a single helper _get_llama_cpp_dir(local_script_info) that returns os.path.dirname(local_script_info[0]) when the env var is set, LLAMA_CPP_DEFAULT_DIR otherwise. Thread the result through: - _download_convert_hf_to_gguf wrapper (sibling-info cache key) - _detect_converter_layout call site - conversion/__init__.py AST extraction - conversion/base.py branding patch target - conversion/qwen.py expert-alias check Adds three tests: - default fallback returns LLAMA_CPP_DEFAULT_DIR - synthetic 3-tuple override returns the source dir - full custom-tree round-trip: layout, sibling info, AST extraction all anchor on the override dir No behaviour change when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset (the common path). When set, the patcher now correctly targets the user's checkout instead of falling through to monolith and no-op'ing.

Three small follow-ups from PR #667 review: - Branding insertion: drop the trailing newline from the last branding line. The match group `suffix` already starts with a newline + indent (it captured the boundary into the line that follows Metadata.load), so the trailing \n on the last branding line produced a blank line between the branding block and the next statement. Aesthetic only; Python parses both forms identically. - Add a one-line explanatory comment inside `_detect_converter_layout`'s bare `except Exception:`. Behaviour unchanged. - Remove unused `logging` and `textwrap` imports from tests/test_convert_hf_to_gguf_patcher.py.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a9a23c7e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T07:54:14Z

+        # load_all_models() runs, so AST-parse the static TEXT_MODEL_MAP /
+        # MMPROJ_MODEL_MAP in conversion/__init__.py instead. Monolith
+        # layout keeps the original _model_classes introspection.
+        _layout = _detect_converter_layout(original_content, _llama_cpp_dir)


Move package detection before importing local converters

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points to a new package-layout checkout outside LLAMA_CPP_DEFAULT_DIR, this layout detection runs only after _load_module_from_path() has already imported a temp copy from LLAMA_CPP_DEFAULT_DIR (lines 1135-1150). If that default directory does not also contain the matching conversion/ package, the entrypoint's from conversion import ... raises ModuleNotFoundError, so the AST-based arch parsing and branding patch are never reached. Fresh evidence in the current revision is that _llama_cpp_dir is now resolved from the selected converter, but the import still happens before this newly added package branch can use it.

Useful? React with 👍 / 👎.

Fixed in 0a3256e. Hoisted _detect_converter_layout above the temp-write + _load_module_from_path step. On the package layout we now skip the module-load entirely and go straight to AST-parsing TEXT_MODEL_MAP / MMPROJ_MODEL_MAP from the resolved converter directory. The original monolith path is preserved verbatim, just nested in the else branch.

original_module_name is initialized to None at the top of the function so the cleanup if original_module_name is not None and original_module_name in sys.modules stays safe on the package branch.

Added test_package_layout_does_not_require_module_import (tests/test_convert_hf_to_gguf_patcher.py): it monkeypatches _load_module_from_path with a sentinel that raises AssertionError if called, builds a synthetic package-layout tree under UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, drives the patcher end-to-end, and asserts the sentinel never fired, base.py was branded in place under the override dir, and LlamaForCausalLM made it into text_archs. 27/27 local tests still pass.

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points at a checkout outside LLAMA_CPP_DEFAULT_DIR, the previous order would temp-write the entrypoint inside the default dir and try to import it; on the new package layout, `from conversion import ...` then resolves against the default dir's missing conversion/ package and raises ModuleNotFoundError before AST arch extraction or branding can run. Hoist _detect_converter_layout above the temp-write/module-load step. On "package" layout, skip the module load entirely and go straight to AST-parsing TEXT_MODEL_MAP / MMPROJ_MODEL_MAP from the selected dir. Monolith path keeps the original behaviour. Added test_package_layout_does_not_require_module_import: a sentinel replaces _load_module_from_path and asserts it is NOT called on the package-layout path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a3256e80e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T08:14:25Z

+            if _layout == "package":
+                conv_base_py = os.path.join(_llama_cpp_dir, "conversion", "base.py")
+                _branding_status = _apply_branding_patch_to_base(conv_base_py)


Save package-layout converters beside their package

When UNSLOTH_LLAMA_CPP_SCRIPTS_DIR points to a package-layout checkout outside LLAMA_CPP_DEFAULT_DIR, this branch patches conversion/base.py in the override tree, but the generated entrypoint is still written under LLAMA_CPP_DEFAULT_DIR later. Running that generated script resolves from conversion import ... relative to its saved location, so installs without a matching LLAMA_CPP_DEFAULT_DIR/conversion fail with ModuleNotFoundError (or import an unrelated package). Fresh evidence in the current revision is that the package branch still targets _llama_cpp_dir here while the patched entrypoint is saved elsewhere.

Useful? React with 👍 / 👎.

+    original_module_name = None    # Only set on the monolith branch
+    # Set by introspection; read by Patch 2 + Patch 3 below. Default to
+    # 'monolith' so a failed introspection still drives the legacy patches.
+    _layout = "monolith"


+    if local_script_info is not None:
+        return os.path.dirname(local_script_info[0])
+    return LLAMA_CPP_DEFAULT_DIR
+pass


+        _stat(base_py),
+        _stat(qwen_py) if os.path.isfile(qwen_py) else None,
+    )
+pass


+        # to monolith so the legacy regex patches still run.
+        pass
+    return "monolith"
+pass


+            if isinstance(node.target, ast.Name) and node.target.id == dict_name:
+                _harvest(node.value)
+    return keys
+pass


+    except OSError:
+        return "pattern-missing"
+    return "applied"
+pass


+    except OSError:
+        return False
+    return (b"num_local_experts" in content) and (b"num_experts" in content)
+pass


…#677) * Fix Q2_K_L recipe: bumps were too aggressive, output was garbage The recipe shipped in #667 expanded q2_k_l to: llama-quantize --output-tensor-type q8_0 --token-embedding-type q8_0 IN OUT q2_k N which produced garbage output. The correct Unsloth Q2_K_L recipe is: llama-quantize \ --tensor-type ffn_down=Q3_K \ --output-tensor-type Q6_K \ --token-embedding-type Q4_K \ IN OUT q2_k N ffn_down=Q3_K relies on llama-quantize's substring matching to cover both ffn_down.weight (dense FFN) and ffn_down_exps.weight (MoE experts) across every layer id with one flag. Tests updated to match the new recipe; added a coverage test documenting that the bare "ffn_down" pattern intentionally hits both dense + MoE down-projection tensors so the override is not split into per-layer enumerations later. * Q2_K_L: chain anchored regex for ffn_down_exps + ffn_down llama-quantize matches --tensor-type with std::regex_search (not substring) and iterates first-match-wins, so we switch from one bare "ffn_down" pattern to two anchored regex patterns: --tensor-type "\.ffn_down_exps=Q3_K" --tensor-type "\.ffn_down=Q3_K" The leading \. anchors on the GGUF path-separator dot, ruling out accidental hits on hypothetical tensor names that merely contain "ffn_down" as a substring (e.g. attn_ffn_downsample). The more-specific MoE pattern is listed first because the loop in src/llama-quant.cpp breaks on first match -- otherwise the dense regex would absorb every MoE expert by prefix and the explicit override would be dead code. Tests updated: assert both patterns are present, assert MoE pattern precedes dense pattern, drop the substring-matching test in favour of an order-invariant test that captures the new contract. --------- Co-authored-by: Daniel Han <info@unsloth.ai>

danielhanchen · 2026-05-21T13:05:32Z

Re-reviewed end-to-end after #677. Splitting the verdict because the two pieces of this PR landed with very different correctness:

1. Converter-package-layout patcher + tests + CI workflow — clean and unchanged on main.

tests/test_convert_hf_to_gguf_patcher.py is byte-identical between pr-667 and main (git diff main..pr-667 -- tests/test_convert_hf_to_gguf_patcher.py returns 0 lines).
.github/workflows/studio-export-fix-ci.yml is byte-identical between pr-667 and main and still present on main — git -C unsloth-zoo show main:.github/workflows/studio-export-fix-ci.yml matches the PR version.
The new-layout detection (TEXT_MODEL_MAP / MMPROJ_MODEL_MAP AST extraction, conversion/base.py idempotent patching, sibling-stat cache invalidation, monolith back-compat) survived #677 unchanged.
Tests on main: 28 passed in 0.83s (pytest tests/test_convert_hf_to_gguf_patcher.py tests/test_quantize_gguf_q2_k_l.py -q). The +1 vs pr-667 is test_q2_k_l_ffn_down_pattern_order_is_specific_first added by Fix Q2_K_L recipe: q2_k base + ffn_down=Q3_K, output=Q6_K, embed=Q4_K #677.

2. q2_k_l recipe — shipped wrong, fixed by #677.

On pr-667 (unsloth_zoo/llama_cpp.py:1805-1807):

_extra_flags = "--output-tensor-type q8_0 --token-embedding-type q8_0 "
quant_type = "q2_k"

Expansion gave q2_k base with output=q8_0, embed=q8_0 and no ffn_down override — bloats file size without addressing the dominant inference-quality lever (ffn_down at Q2_K is the main degradation source on a q2_k base).

On main after #677 (unsloth_zoo/llama_cpp.py:1811-1818):

_extra_flags = (
    '--tensor-type "\\.ffn_down_exps=Q3_K" '
    '--tensor-type "\\.ffn_down=Q3_K" '
    '--output-tensor-type Q6_K '
    '--token-embedding-type Q4_K '
)
quant_type = "q2_k"

Output q8_0 → Q6_K, embed q8_0 → Q4_K, plus two --tensor-type regex overrides for ffn_down / ffn_down_exps → Q3_K (both dense and MoE), chained specific-first because llama-quantize iterates first-match-wins (std::regex_search in src/llama-quant.cpp). The leading \. anchors on the GGUF dot separator so the override cannot leak into unrelated tensor names.

Net: the converter-layout half was a clean, regression-free fix and is still useful on main. The q2_k_l half shipped broken and required a same-week followup (#677). The takeaway for next time: split a quant-recipe change from a converter-package-layout change so a recipe regression can be reverted independently, and gate the recipe behind a real quant-quality probe rather than a recipe-shape unit test.

Cross-OS validation for similar Studio/MLX backend tests is pinned in danielhanchen/unsloth-staging-2#137 for the parent repo's cohort.

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

github-code-quality Bot found potential problems May 18, 2026

View reviewed changes

Daniel Han added 2 commits May 19, 2026 07:49

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

This was referenced May 19, 2026

[staging] Fix Studio q2_k_l GGUF export + new llama.cpp converter package layout danielhanchen/unsloth-zoo-staging#18

Closed

[staging] e2e Studio q2_k_l GGUF export validation (Linux / macOS / Windows) danielhanchen/unsloth-staging-2#127

Closed

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

github-code-quality Bot found potential problems May 19, 2026

View reviewed changes

danielhanchen merged commit dda8dc7 into main May 19, 2026
18 checks passed

danielhanchen deleted the fix-studio-q2-k-l-and-converter-package branch May 19, 2026 10:13

danielhanchen mentioned this pull request May 19, 2026

Fix Q2_K_L recipe: q2_k base + ffn_down=Q3_K, output=Q6_K, embed=Q4_K #677

Merged

6 tasks

	_layout = "monolith"
	_layout = "monolith"
	_llama_cpp_dir = _get_llama_cpp_dir(_local_script_info)

	_layout = _detect_converter_layout(original_content, LLAMA_CPP_DEFAULT_DIR)
	_layout = _detect_converter_layout(original_content, _llama_cpp_dir)

	conv_init_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "__init__.py")
	conv_init_py = os.path.join(_llama_cpp_dir, "conversion", "__init__.py")

	conv_base_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "base.py")
	conv_base_py = os.path.join(_llama_cpp_dir, "conversion", "base.py")

	conv_qwen_py = os.path.join(LLAMA_CPP_DEFAULT_DIR, "conversion", "qwen.py")
	conv_qwen_py = os.path.join(_llama_cpp_dir, "conversion", "qwen.py")

Conversation

danielhanchen commented May 18, 2026

Summary

Bug 1: q2_k_l preset crashes the MLX export path

Bug 2: patcher drift on the new conversion/ package layout

Cross-OS safety

Test plan

Checklist

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!