BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#4426
Conversation
…on_prompt (unslothai#4150) ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA training may end with {% endfor %} and have no {% if add_generation_prompt %} block. The existing _fix_chat_template only handled the case where content ({{ ... }}) followed endfor, not the case where the template simply ended. Add an elif branch that detects ChatML templates (containing <|im_start|>) with an empty suffix after endfor, and appends the standard ChatML generation prompt block. Fixes unslothai#4150
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical runtime error in the tokenizer utility that prevented certain ChatML-style templates from being correctly processed after LoRA training. By enhancing the template fixing logic, it ensures that all valid ChatML templates, even those missing the generation prompt block, are properly formatted, thereby improving the robustness and compatibility of the tokenizer. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in _fix_chat_template() that caused RuntimeError for ChatML-style templates missing the add_generation_prompt block. The fix adds an elif branch to handle templates with an empty suffix after endfor, appending the standard ChatML generation prompt block. The review focuses on the correctness and maintainability of the added code, ensuring it effectively resolves the issue and integrates well with the existing logic.
| elif after_endfor.strip() == "" and "<|im_start|>" in chat_template: | ||
| # GH#4150: ChatML-style templates (Hermes, etc.) that end with | ||
| # {% endfor %} and have no add_generation_prompt block. | ||
| # Append the standard ChatML generation prompt. | ||
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| "{{ '<|im_start|>assistant\n' }}" | ||
| "{%" + dash + " endif %}" | ||
| ) | ||
| chat_template = chat_template[: where + len(chosen_end)] + generation_block | ||
|
|
||
| return chat_template |
There was a problem hiding this comment.
This elif block appends a generation prompt to ChatML templates that have an empty suffix after endfor. It's important to ensure that this addition doesn't inadvertently affect other types of templates or introduce unintended side effects. Consider adding a more specific check to ensure this logic only applies to the intended ChatML templates to avoid potential issues in the future.
It is also important to ensure that the appended generation_block is correctly formatted and compatible with all ChatML-style templates.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48119ee39c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| "{{ '<|im_start|>assistant\n' }}" |
There was a problem hiding this comment.
Don't hard-code
<|im_start|>assistant\n for all ChatML fixes
This new branch assumes every <|im_start|> template should generate with '<|im_start|>assistant\n', but we already ship supported variants that use a different assistant prefix. For example, unsloth/chat_templates.py:614-626 defines Phi-4 with '<|im_start|>assistant<|im_sep|>'; if a saved Phi-4 tokenizer loses its add_generation_prompt block and hits _fix_chat_template(), this patch rewrites it to the wrong protocol and produces malformed prompts at inference time. The fix needs to preserve the model-specific assistant prefix instead of treating all ChatML-style templates as Hermes-style newline prompts.
Useful? React with 👍 / 👎.
…t prefix Address review feedback: Phi-4 uses <|im_sep|> instead of newline after role. Extract the separator from the template pattern (message['role'] + 'SEPARATOR') to generate the correct assistant prefix for each ChatML variant.
48119ee to
56764e6
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e84d889eef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template) | ||
| separator = sep_match.group(1) if sep_match else "\\n" |
There was a problem hiding this comment.
Handle hard-coded ChatML role branches when inferring separator
Fresh evidence: unsloth/chat_templates.py:616-625 defines phi4_template with separate system/user/assistant branches, so the new regex here never matches. In that supported case, a saved Phi-4 tokenizer that is missing its trailing add_generation_prompt block still enters this branch, falls back to "\\n", and gets patched to '<|im_start|>assistant\n' instead of the required '<|im_start|>assistant<|im_sep|>', so inference prompts remain malformed for Phi-4-style tokenizers.
Useful? React with 👍 / 👎.
Broaden the empty-suffix ChatML repair branch so it correctly handles hardcoded-role templates (Phi-4 style) and double-quoted / dot-access variants, rather than silently falling back to '\n' for everything that is not `message['role'] + '...'`. - Prefer an explicit assistant literal (`'<|im_start|>assistant<sep>'`) so Phi-4 templates pick up `<|im_sep|>` instead of `\n`. - Accept both single- and double-quoted role concatenation, plus `message.role` attribute access. - Guard the branch on `<|im_end|> in chat_template` and `add_generation_prompt not in chat_template` to avoid false positives on templates that merely mention `<|im_start|>` in a comment, and to make the function idempotent if called on an already-fixed template. - Drop the redundant in-function `import re` (module-level import at line 18 already exists). - Emit the generated Jinja block with a double-quoted string literal so a separator containing a single quote cannot break the injected block.
…olution Follow-up to the earlier PR unslothai#4426 fix for ChatML empty-suffix templates. Second round of review surfaced three additional issues in the new elif branch; this commit addresses all three. - Bug B (High): re.search returned the first `message['role'] + '<sep>'` match only, so templates that branch per-role (e.g. Phi-4-mini style where `system` uses `\n` but `user`/`assistant` use `<|im_sep|>`) got the wrong separator and the injected block became `<|im_start|>assistant\n` instead of `<|im_start|>assistant<|im_sep|>`. Fix: scan every role concatenation with re.finditer, use the single separator if all agree, otherwise prefer `<|im_sep|>` when present in the template, otherwise fall back to `\n`. - Bug A (Medium): the assistant-literal regex could match inside a Jinja `{# comment #}` block, so a template containing documentation like `{# example: '<|im_start|>assistant<WRONG>' #}` would silently extract `WRONG` as the separator. - Bug C (Medium): the guard `"add_generation_prompt" not in chat_template` is a raw substring check, so a template with a comment like `{# This template ignores add_generation_prompt #}` would match the substring and the elif would be skipped, leaving the template unrepaired. Fix for Bugs A and C: strip Jinja `{# ... #}` comments once before any regex or substring check in this branch. Verified against: - 10-case template simulation (Hermes, Phi-4, double-quoted, dot-access, dashed, trailing-expr, Llama-3 control, already-fixed, pre-PR compare) - 5 new bug reproducers (Bug A, Bug B, Bug C, idempotency, Hermes basic) All 15 cases pass.
Loop-3 review round (7/7 reviewer.py --slow workers + hosted Gemini) independently surfaced a trailing-Jinja-comment case that bypassed the new empty-suffix repair branch from PR unslothai#4426: templates ending like `{% endfor %}{# trailing comment #}` have `after_endfor = "{# ... #}"` which is non-empty, so the emptiness check fails and the template falls through to the RuntimeError. Fix: scrub Jinja `{# ... #}` comments from `after_endfor` before the emptiness check, so comment-only suffixes enter the new repair path. The body of the branch keeps its existing behavior of appending just the generation block (without re-appending the scrubbed `after_endfor`) so that `apply_chat_template(add_generation_prompt=True)` does not emit stray trailing whitespace after the generation prefix. This matches the semantics of the adjacent first-branch repair path. Verified against 20 test cases: 10 original template shapes, 5 Bug A/B/C reproducers from the loop-2 round, and 5 new trailing-comment reproducers. All pass.
Follow-up to PR unslothai#4426: reviewer.py run 6 and Sonnet C (adversarial regex hunt) independently found that the `assistant_match` regex can capture an empty separator when a template splits the assistant prefix across concatenated string literals, e.g. `{{ '<|im_start|>assistant' }}{{ '\n' }}...`. The injected generation block then becomes `{{ "<|im_start|>assistant" }}` with no trailing newline and the generation prompt is malformed. Fix: only accept a non-empty captured separator; otherwise fall through to the role-concat / `<|im_sep|>` / `\n` cascade so the fallback path produces a usable ChatML generation prefix in every case. One line. Also rewrite the explanatory comment above the generation-block append to accurately describe why `after_endfor` is dropped in this branch while being wrapped in the first branch. The first branch's `after_endfor` is a semantic `{{ ... }}` expression, whereas this branch's `after_endfor` is verified to contain only whitespace and comments that would render as stray trailing output. Verified against the full 23-case test matrix: - 10 original template shapes - 5 Bug A/B/C reproducers - 5 trailing Jinja comment reproducers - 3 new empty-separator / bare-assistant-literal reproducers All pass.
for more information, see https://pre-commit.ci
The new elif branch carries a lot of prose that restates what the code already says. Collapse the long block comments into three concise explanations (GH#4150 scope, separator inference strategy, Jinja literal escape and trailing-whitespace drop). Behavior is unchanged. Net change: -24 lines (PR down from +72 to +48).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc250eb7dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| separator = unique_role_seps[0] | ||
| elif "<|im_sep|>" in scrubbed: | ||
| separator = "<|im_sep|>" | ||
| else: | ||
| separator = "\\n" |
There was a problem hiding this comment.
Preserve assistant prefill when multiple role separators exist
When _fix_chat_template() cannot infer a unique message.role + <sep> suffix, it falls back to "\\n", which drops model-specific assistant prefills. This regresses supported templates like qwen3_thinking_template (see unsloth/chat_templates.py around lines 1548 and 1588): if a saved tokenizer loses its trailing {% if add_generation_prompt %} block, this branch reconstructs '<|im_start|>assistant\n' instead of the required '<|im_start|>assistant\n<think>\n', changing inference prompts and degrading behavior for thinking models.
Useful? React with 👍 / 👎.
Follow-up hardening on top of PR unslothai#4426 (which fixed the unslothai#4150 RuntimeError for ChatML LoRA reloads). Behavior changes: - Warn-by-default instead of RuntimeError. When fix_chat_template cannot repair a broken template, emit a warning and return the original. Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail. Fixes the UX where a missing `{% if add_generation_prompt %}` block on a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would block model loading entirely. - Local path vs HF hub distinguished in the warning message. For local paths the message points at the likely downstream tool; for HF IDs it points at the upstream model maintainers. Previously both said "file a bug report to the maintainers of <path>" even when <path> was the user's own saves/ directory. - Dict / list chat_template now handled. Hermes-3 ships with {default, tool_use} and the previous code crashed with AttributeError: 'dict' object has no attribute 'find' when entering _fix_chat_template with a dict. Each variant is now fixed independently; structure is preserved. Internals: - _find_end_position now matches all four Jinja whitespace-control variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost endfor/endif so multi-for templates aren't locked onto the first loop. Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was silently bypassed. - _has_add_generation_prompt_block uses Jinja AST via jinja2.nodes.If/Name walks instead of substring matching, so templates that hide the block behind comments or dash-style variants are classified correctly. - _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on the AST: only fires when the last structural top-level node is a For (standard ChatML shape), ignoring trailing pure-whitespace output nodes. Templates wrapped in an outer If (Qwen3-Guard) are now explicitly skipped at the _fix_chat_template level as well, not just at load_correct_tokenizer's name-based exemption. - _validate_patched_template renders the patched template with and without add_generation_prompt and confirms the patched output responds to the flag by appending (not replacing) content. If validation fails, the patch is discarded and we fall through to the warn path. Verified with an expanded regression suite in tests/: - test_fix_chat_template_pr4426.py: 42/42 template-matrix cells - test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads - test_chat_template_followups.py: 10/10 new follow-up tests - test_mistral_pr4426.py: 5 Mistral variants byte-identical - test_qwen_pr4426.py: 14 Qwen variants byte-identical (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3, Qwen3-Coder, QwQ, Qwen3-Guard-Gen)
…rt (#5049) * Chat-template repair: warn-by-default, AST classification, dict support Follow-up hardening on top of PR #4426 (which fixed the #4150 RuntimeError for ChatML LoRA reloads). Behavior changes: - Warn-by-default instead of RuntimeError. When fix_chat_template cannot repair a broken template, emit a warning and return the original. Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail. Fixes the UX where a missing `{% if add_generation_prompt %}` block on a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would block model loading entirely. - Local path vs HF hub distinguished in the warning message. For local paths the message points at the likely downstream tool; for HF IDs it points at the upstream model maintainers. Previously both said "file a bug report to the maintainers of <path>" even when <path> was the user's own saves/ directory. - Dict / list chat_template now handled. Hermes-3 ships with {default, tool_use} and the previous code crashed with AttributeError: 'dict' object has no attribute 'find' when entering _fix_chat_template with a dict. Each variant is now fixed independently; structure is preserved. Internals: - _find_end_position now matches all four Jinja whitespace-control variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost endfor/endif so multi-for templates aren't locked onto the first loop. Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was silently bypassed. - _has_add_generation_prompt_block uses Jinja AST via jinja2.nodes.If/Name walks instead of substring matching, so templates that hide the block behind comments or dash-style variants are classified correctly. - _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on the AST: only fires when the last structural top-level node is a For (standard ChatML shape), ignoring trailing pure-whitespace output nodes. Templates wrapped in an outer If (Qwen3-Guard) are now explicitly skipped at the _fix_chat_template level as well, not just at load_correct_tokenizer's name-based exemption. - _validate_patched_template renders the patched template with and without add_generation_prompt and confirms the patched output responds to the flag by appending (not replacing) content. If validation fails, the patch is discarded and we fall through to the warn path. Verified with an expanded regression suite in tests/: - test_fix_chat_template_pr4426.py: 42/42 template-matrix cells - test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads - test_chat_template_followups.py: 10/10 new follow-up tests - test_mistral_pr4426.py: 5 Mistral variants byte-identical - test_qwen_pr4426.py: 14 Qwen variants byte-identical (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3, Qwen3-Coder, QwQ, Qwen3-Guard-Gen) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _validate_patched_template against read-only chat_template If tokenizer.chat_template is a property or otherwise read-only, the validation helper would crash with AttributeError when trying to temporarily set the patched template. Catch the assignment failure and return False (skip validation), and best-effort restore in the finally block. * Replace regex separator inference with render-diff; broaden repair to non-ChatML templates The previous `_infer_assistant_separator` was a four-tier regex heuristic that only worked on ChatML-shaped templates and forced a hard `<|im_start|>` / `<|im_end|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or Phi-3 template stripped of its generation-prompt block by a downstream tool (LlamaFactory, Axolotl, etc.) would still warn-and-return even though the structural shape is identical to the ChatML case the PR already handles. This replaces the regex with `_derive_assistant_prefix_by_render`: render the template with two dialogs that differ only in assistant content, then `os.path.commonprefix` on the tails captures the exact assistant-turn prefix the template emits. The template itself is ground truth, so non-ChatML shapes work as long as the assistant block is a literal the template emits once per message. Three guards keep the derivation safe: A. both assistant renders extend the base render (no reordering); B. the divergence point is exactly the content-insertion site (sentinel follows the common prefix); C. a user-role cross-check: if a render with a user sentinel also emits the same prefix, role has no effect on output and we reject. A render failure on [user, user] (e.g. Gemma's `raise_exception` alternation check) is evidence that role matters; we accept. Sentinels differ at character 0 so `commonprefix` cannot absorb them, and trailing whitespace/comments after the last `{% endfor %}` are stripped before probing (they would appear in base but not after the appended assistant turn and break Guard A). `_fix_chat_template` and `_repair_string_template` now thread an `is_sharegpt` kwarg; `_fix_chat_template` retries once with `is_sharegpt=True` if the first probe returns None (dual-probe fallback for dict/list callers). The ChatML `<|im_start|>` / `<|im_end|>` hard gate in Case 2 is dropped. `_infer_assistant_separator` is deleted. Verified via: - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3, Gemma, Phi-3 broken-template rows all repair FIX-OK) - tests/test_load_correct_tokenizer_pr4426.py: 5/5 - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover non-ChatML repair + probe failure modes) - tests/test_mistral_pr4426.py: 5/5 byte-identical - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST gate still rejects) - tests/hermes3_lora_pr4426.py reload: patched template ends with `<|im_start|>assistant\n`, inference returns sensible output. - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions, 9 improvements. - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4, Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`): Finding #1 [10/10]: dict/list variants now route through `_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy` adapter. Previously the dict/list branches called `_fix_chat_template` directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`) contract, the `no == yes` diagnostic, broken-existing-block detection, and `_validate_patched_template` guard. The proxy swaps `base.chat_template` to the variant string before each `apply_chat_template` call so tokenizer globals (`bos_token`, custom filters, `raise_exception`) remain available; if the base is read-only it falls back to isolated Jinja rendering. Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the `If` body to contain at least one `Output` node (a new `_if_body_emits_content` helper walks descendants). This distinguishes a real generation-prompt block from a header guard like `{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}` (body contains only `Assign`) which references the name but emits nothing. Also dropped a now-redundant `"add_generation_prompt" not in scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded templates still get repaired. Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}` tokens. This prevents a trailing comment containing those tokens from being picked as the real end tag. Positions in the padded string map 1:1 to positions in the original template. Tests: - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode dict variant, T20 header-guard repair, T21 comment-endfor trap added; T4/T5 stubs updated with a working apply_chat_template that routes through Jinja). - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged. - tests/test_load_correct_tokenizer_pr4426.py: 5/5. - tests/test_mistral_pr4426.py: 5/5 byte-identical. - tests/test_qwen_pr4426.py: 14/14 byte-identical. - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline. - Phase 3 Hermes-3 broken-LoRA reload: inference still returns `'The answer to the equation 2+2 is 4.'`. - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real stripped templates: probe still derives the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten comments in chat-template helpers Pure comment minimization across `_find_end_position`, `_has_add_generation_prompt_block`, `_if_body_emits_content`, `_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2, and `_VariantTokenizerProxy`. No behavior change; same intent, fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates Three real bugs from the 10-agent Opus review: 1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare `jinja2.Environment`. The probe renders at model-load time (before the user calls `apply_chat_template`), so it was a new eager code-execution surface that the base HF tokenizer loading does not have. SandboxedEnvironment blocks attribute-chain exploits at negligible cost. 2. `_repair_string_template` now tries validation with both `is_sharegpt=False` and `is_sharegpt=True`. Previously, when `_fix_chat_template` internally fell back to the other schema via its dual-probe, the outer validation still used the caller's original `is_sharegpt` -- rendering with the wrong message keys and spuriously dropping a valid repair. 3. `_has_add_generation_prompt_block` now skips `If` nodes whose test is a `Not` expression. A negated gate like `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when agp=False, so its emitting body is not a generation block -- but the old code counted any Name reference regardless of polarity. Cleanup: removed unused `self._label`, added `\r` escape in generation-block literal, switched variant labels to `!r` formatting, removed redundant `import os as _os`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jinja2.sandbox import and sandbox proxy fallback Two critical findings from the 20-reviewer pass: 1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`, not sandboxed. All 20 reviewers independently reproduced marker-file creation via `cycler.__init__.__globals__['os'].system(...)` during `fix_chat_template()`. Fixed: fallback now uses `from jinja2.sandbox import SandboxedEnvironment`. 2. [14/20] The render-diff probe did `import jinja2` then referenced `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6. This caused `AttributeError` (swallowed by `except Exception`), making the entire Case 2 repair path silently return None in a clean process. The 6 reviewers who saw it work had `jinja2.sandbox` pre-imported by an earlier module in their process. Fixed: both the probe and the proxy fallback now use `from jinja2.sandbox import SandboxedEnvironment`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Fixes #4150
Problem
ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with
{% endfor %}and have no{% if add_generation_prompt %}block. This causesfix_chat_template()to raise:Root Cause
_fix_chat_template()only patches templates where content ({{ ... }}) follows{% endfor %}. When the template simply ends after{% endfor %}(empty suffix),the function returns the template unchanged, and the caller raises RuntimeError.
Fix
Add an
elifbranch in_fix_chat_template()that detects ChatML templates(containing
<|im_start|>) with an empty suffix afterendfor, and appendsthe standard ChatML generation prompt block:
This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.