Skip to content

BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#4426

Merged
danielhanchen merged 9 commits into
unslothai:mainfrom
kimimgo:fix/chat-template-chatml-4150
Apr 16, 2026
Merged

BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#4426
danielhanchen merged 9 commits into
unslothai:mainfrom
kimimgo:fix/chat-template-chatml-4150

Conversation

@kimimgo
Copy link
Copy Markdown
Contributor

@kimimgo kimimgo commented Mar 18, 2026

Fixes #4150

Problem

ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with {% endfor %} and have no {% if add_generation_prompt %} block. This causes
fix_chat_template() to raise:

RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.

Root Cause

_fix_chat_template() only patches templates where content ({{ ... }}) follows
{% endfor %}. When the template simply ends after {% endfor %} (empty suffix),
the function returns the template unchanged, and the caller raises RuntimeError.

Fix

Add an elif branch in _fix_chat_template() that detects ChatML templates
(containing <|im_start|>) with an empty suffix after endfor, and appends
the standard ChatML generation prompt block:

{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.

…on_prompt (unslothai#4150)

ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA
training may end with {% endfor %} and have no
{% if add_generation_prompt %} block. The existing _fix_chat_template
only handled the case where content ({{ ... }}) followed endfor,
not the case where the template simply ended.

Add an elif branch that detects ChatML templates (containing
<|im_start|>) with an empty suffix after endfor, and appends the
standard ChatML generation prompt block.

Fixes unslothai#4150
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical runtime error in the tokenizer utility that prevented certain ChatML-style templates from being correctly processed after LoRA training. By enhancing the template fixing logic, it ensures that all valid ChatML templates, even those missing the generation prompt block, are properly formatted, thereby improving the robustness and compatibility of the tokenizer.

Highlights

  • Bug Fix: Resolved a RuntimeError that occurred when _fix_chat_template() processed ChatML-style templates (e.g., Hermes-3, Magnum-v2) that ended with {% endfor %} but lacked an {% if add_generation_prompt %} block.
  • Root Cause Addressed: The previous implementation of _fix_chat_template() failed to patch templates where content (like {{ ... }}) did not follow {% endfor %} and the suffix was empty, leading to the template being returned unchanged and subsequent errors.
  • Template Patching Logic: Introduced an elif branch in _fix_chat_template() to specifically detect ChatML templates (identified by <|im_start|>) with an empty suffix after endfor. For these cases, the standard ChatML generation prompt block {% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %} is now appended.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in _fix_chat_template() that caused RuntimeError for ChatML-style templates missing the add_generation_prompt block. The fix adds an elif branch to handle templates with an empty suffix after endfor, appending the standard ChatML generation prompt block. The review focuses on the correctness and maintainability of the added code, ensuring it effectively resolves the issue and integrates well with the existing logic.

Comment thread unsloth/tokenizer_utils.py Outdated
Comment on lines 675 to 686
elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
# GH#4150: ChatML-style templates (Hermes, etc.) that end with
# {% endfor %} and have no add_generation_prompt block.
# Append the standard ChatML generation prompt.
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
"{{ '<|im_start|>assistant\n' }}"
"{%" + dash + " endif %}"
)
chat_template = chat_template[: where + len(chosen_end)] + generation_block

return chat_template
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This elif block appends a generation prompt to ChatML templates that have an empty suffix after endfor. It's important to ensure that this addition doesn't inadvertently affect other types of templates or introduce unintended side effects. Consider adding a more specific check to ensure this logic only applies to the intended ChatML templates to avoid potential issues in the future.

It is also important to ensure that the appended generation_block is correctly formatted and compatible with all ChatML-style templates.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48119ee39c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth/tokenizer_utils.py Outdated
Comment on lines +679 to +681
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
"{{ '<|im_start|>assistant\n' }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't hard-code <|im_start|>assistant\n for all ChatML fixes

This new branch assumes every <|im_start|> template should generate with '<|im_start|>assistant\n', but we already ship supported variants that use a different assistant prefix. For example, unsloth/chat_templates.py:614-626 defines Phi-4 with '<|im_start|>assistant<|im_sep|>'; if a saved Phi-4 tokenizer loses its add_generation_prompt block and hits _fix_chat_template(), this patch rewrites it to the wrong protocol and produces malformed prompts at inference time. The fix needs to preserve the model-specific assistant prefix instead of treating all ChatML-style templates as Hermes-style newline prompts.

Useful? React with 👍 / 👎.

…t prefix

Address review feedback: Phi-4 uses <|im_sep|> instead of newline
after role. Extract the separator from the template pattern
(message['role'] + 'SEPARATOR') to generate the correct
assistant prefix for each ChatML variant.
@kimimgo kimimgo force-pushed the fix/chat-template-chatml-4150 branch from 48119ee to 56764e6 Compare March 19, 2026 10:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e84d889eef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth/tokenizer_utils.py Outdated
Comment on lines +682 to +683
sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template)
separator = sep_match.group(1) if sep_match else "\\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle hard-coded ChatML role branches when inferring separator

Fresh evidence: unsloth/chat_templates.py:616-625 defines phi4_template with separate system/user/assistant branches, so the new regex here never matches. In that supported case, a saved Phi-4 tokenizer that is missing its trailing add_generation_prompt block still enters this branch, falls back to "\\n", and gets patched to '<|im_start|>assistant\n' instead of the required '<|im_start|>assistant<|im_sep|>', so inference prompts remain malformed for Phi-4-style tokenizers.

Useful? React with 👍 / 👎.

danielhanchen and others added 5 commits April 11, 2026 11:50
Broaden the empty-suffix ChatML repair branch so it correctly handles
hardcoded-role templates (Phi-4 style) and double-quoted / dot-access
variants, rather than silently falling back to '\n' for everything that
is not `message['role'] + '...'`.

- Prefer an explicit assistant literal (`'<|im_start|>assistant<sep>'`)
  so Phi-4 templates pick up `<|im_sep|>` instead of `\n`.
- Accept both single- and double-quoted role concatenation, plus
  `message.role` attribute access.
- Guard the branch on `<|im_end|> in chat_template` and
  `add_generation_prompt not in chat_template` to avoid false positives
  on templates that merely mention `<|im_start|>` in a comment, and to
  make the function idempotent if called on an already-fixed template.
- Drop the redundant in-function `import re` (module-level import at
  line 18 already exists).
- Emit the generated Jinja block with a double-quoted string literal so
  a separator containing a single quote cannot break the injected block.
…olution

Follow-up to the earlier PR unslothai#4426 fix for ChatML empty-suffix templates.
Second round of review surfaced three additional issues in the new
elif branch; this commit addresses all three.

- Bug B (High): re.search returned the first `message['role'] + '<sep>'`
  match only, so templates that branch per-role (e.g. Phi-4-mini style
  where `system` uses `\n` but `user`/`assistant` use `<|im_sep|>`) got
  the wrong separator and the injected block became
  `<|im_start|>assistant\n` instead of `<|im_start|>assistant<|im_sep|>`.
  Fix: scan every role concatenation with re.finditer, use the single
  separator if all agree, otherwise prefer `<|im_sep|>` when present in
  the template, otherwise fall back to `\n`.

- Bug A (Medium): the assistant-literal regex could match inside a
  Jinja `{# comment #}` block, so a template containing documentation
  like `{# example: '<|im_start|>assistant<WRONG>' #}` would silently
  extract `WRONG` as the separator.

- Bug C (Medium): the guard `"add_generation_prompt" not in chat_template`
  is a raw substring check, so a template with a comment like
  `{# This template ignores add_generation_prompt #}` would match the
  substring and the elif would be skipped, leaving the template
  unrepaired.

Fix for Bugs A and C: strip Jinja `{# ... #}` comments once before any
regex or substring check in this branch.

Verified against:
- 10-case template simulation (Hermes, Phi-4, double-quoted, dot-access,
  dashed, trailing-expr, Llama-3 control, already-fixed, pre-PR compare)
- 5 new bug reproducers (Bug A, Bug B, Bug C, idempotency, Hermes basic)
All 15 cases pass.
Loop-3 review round (7/7 reviewer.py --slow workers + hosted Gemini)
independently surfaced a trailing-Jinja-comment case that bypassed the
new empty-suffix repair branch from PR unslothai#4426: templates ending like
`{% endfor %}{# trailing comment #}` have `after_endfor = "{# ... #}"`
which is non-empty, so the emptiness check fails and the template
falls through to the RuntimeError.

Fix: scrub Jinja `{# ... #}` comments from `after_endfor` before the
emptiness check, so comment-only suffixes enter the new repair path.

The body of the branch keeps its existing behavior of appending just
the generation block (without re-appending the scrubbed `after_endfor`)
so that `apply_chat_template(add_generation_prompt=True)` does not
emit stray trailing whitespace after the generation prefix. This
matches the semantics of the adjacent first-branch repair path.

Verified against 20 test cases: 10 original template shapes, 5 Bug
A/B/C reproducers from the loop-2 round, and 5 new trailing-comment
reproducers. All pass.
Follow-up to PR unslothai#4426: reviewer.py run 6 and Sonnet C (adversarial
regex hunt) independently found that the `assistant_match` regex can
capture an empty separator when a template splits the assistant prefix
across concatenated string literals, e.g.
`{{ '<|im_start|>assistant' }}{{ '\n' }}...`. The injected generation
block then becomes `{{ "<|im_start|>assistant" }}` with no trailing
newline and the generation prompt is malformed.

Fix: only accept a non-empty captured separator; otherwise fall through
to the role-concat / `<|im_sep|>` / `\n` cascade so the fallback path
produces a usable ChatML generation prefix in every case. One line.

Also rewrite the explanatory comment above the generation-block append
to accurately describe why `after_endfor` is dropped in this branch
while being wrapped in the first branch. The first branch's
`after_endfor` is a semantic `{{ ... }}` expression, whereas this
branch's `after_endfor` is verified to contain only whitespace and
comments that would render as stray trailing output.

Verified against the full 23-case test matrix:
- 10 original template shapes
- 5 Bug A/B/C reproducers
- 5 trailing Jinja comment reproducers
- 3 new empty-separator / bare-assistant-literal reproducers
All pass.
@danielhanchen danielhanchen added auto-approved Auto-review approved the PR and removed auto-reviewing Auto-review in progress auto-approved Auto-review approved the PR labels Apr 11, 2026
The new elif branch carries a lot of prose that restates what the code
already says. Collapse the long block comments into three concise
explanations (GH#4150 scope, separator inference strategy, Jinja literal
escape and trailing-whitespace drop). Behavior is unchanged.

Net change: -24 lines (PR down from +72 to +48).
@danielhanchen danielhanchen merged commit 14ab6fb into unslothai:main Apr 16, 2026
1 check passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc250eb7dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +705 to +709
separator = unique_role_seps[0]
elif "<|im_sep|>" in scrubbed:
separator = "<|im_sep|>"
else:
separator = "\\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve assistant prefill when multiple role separators exist

When _fix_chat_template() cannot infer a unique message.role + <sep> suffix, it falls back to "\\n", which drops model-specific assistant prefills. This regresses supported templates like qwen3_thinking_template (see unsloth/chat_templates.py around lines 1548 and 1588): if a saved tokenizer loses its trailing {% if add_generation_prompt %} block, this branch reconstructs '<|im_start|>assistant\n' instead of the required '<|im_start|>assistant\n<think>\n', changing inference prompts and degrading behavior for thinking models.

Useful? React with 👍 / 👎.

danielhanchen added a commit to danielhanchen/unsloth-staging-2 that referenced this pull request Apr 16, 2026
Follow-up hardening on top of PR unslothai#4426 (which fixed the unslothai#4150
RuntimeError for ChatML LoRA reloads).

Behavior changes:

- Warn-by-default instead of RuntimeError. When fix_chat_template cannot
  repair a broken template, emit a warning and return the original.
  Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail.
  Fixes the UX where a missing `{% if add_generation_prompt %}` block on
  a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would
  block model loading entirely.

- Local path vs HF hub distinguished in the warning message. For local
  paths the message points at the likely downstream tool; for HF IDs it
  points at the upstream model maintainers. Previously both said "file a
  bug report to the maintainers of <path>" even when <path> was the
  user's own saves/ directory.

- Dict / list chat_template now handled. Hermes-3 ships with
  {default, tool_use} and the previous code crashed with
  AttributeError: 'dict' object has no attribute 'find' when entering
  _fix_chat_template with a dict. Each variant is now fixed
  independently; structure is preserved.

Internals:

- _find_end_position now matches all four Jinja whitespace-control
  variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost
  endfor/endif so multi-for templates aren't locked onto the first loop.
  Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was
  silently bypassed.

- _has_add_generation_prompt_block uses Jinja AST via
  jinja2.nodes.If/Name walks instead of substring matching, so
  templates that hide the block behind comments or dash-style variants
  are classified correctly.

- _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on
  the AST: only fires when the last structural top-level node is a For
  (standard ChatML shape), ignoring trailing pure-whitespace output
  nodes. Templates wrapped in an outer If (Qwen3-Guard) are now
  explicitly skipped at the _fix_chat_template level as well, not just
  at load_correct_tokenizer's name-based exemption.

- _validate_patched_template renders the patched template with and
  without add_generation_prompt and confirms the patched output
  responds to the flag by appending (not replacing) content. If
  validation fails, the patch is discarded and we fall through to the
  warn path.

Verified with an expanded regression suite in tests/:
- test_fix_chat_template_pr4426.py: 42/42 template-matrix cells
- test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads
- test_chat_template_followups.py: 10/10 new follow-up tests
- test_mistral_pr4426.py: 5 Mistral variants byte-identical
- test_qwen_pr4426.py: 14 Qwen variants byte-identical
  (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3,
  Qwen3-Coder, QwQ, Qwen3-Guard-Gen)
danielhanchen added a commit that referenced this pull request Apr 16, 2026
…rt (#5049)

* Chat-template repair: warn-by-default, AST classification, dict support

Follow-up hardening on top of PR #4426 (which fixed the #4150
RuntimeError for ChatML LoRA reloads).

Behavior changes:

- Warn-by-default instead of RuntimeError. When fix_chat_template cannot
  repair a broken template, emit a warning and return the original.
  Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail.
  Fixes the UX where a missing `{% if add_generation_prompt %}` block on
  a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would
  block model loading entirely.

- Local path vs HF hub distinguished in the warning message. For local
  paths the message points at the likely downstream tool; for HF IDs it
  points at the upstream model maintainers. Previously both said "file a
  bug report to the maintainers of <path>" even when <path> was the
  user's own saves/ directory.

- Dict / list chat_template now handled. Hermes-3 ships with
  {default, tool_use} and the previous code crashed with
  AttributeError: 'dict' object has no attribute 'find' when entering
  _fix_chat_template with a dict. Each variant is now fixed
  independently; structure is preserved.

Internals:

- _find_end_position now matches all four Jinja whitespace-control
  variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost
  endfor/endif so multi-for templates aren't locked onto the first loop.
  Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was
  silently bypassed.

- _has_add_generation_prompt_block uses Jinja AST via
  jinja2.nodes.If/Name walks instead of substring matching, so
  templates that hide the block behind comments or dash-style variants
  are classified correctly.

- _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on
  the AST: only fires when the last structural top-level node is a For
  (standard ChatML shape), ignoring trailing pure-whitespace output
  nodes. Templates wrapped in an outer If (Qwen3-Guard) are now
  explicitly skipped at the _fix_chat_template level as well, not just
  at load_correct_tokenizer's name-based exemption.

- _validate_patched_template renders the patched template with and
  without add_generation_prompt and confirms the patched output
  responds to the flag by appending (not replacing) content. If
  validation fails, the patch is discarded and we fall through to the
  warn path.

Verified with an expanded regression suite in tests/:
- test_fix_chat_template_pr4426.py: 42/42 template-matrix cells
- test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads
- test_chat_template_followups.py: 10/10 new follow-up tests
- test_mistral_pr4426.py: 5 Mistral variants byte-identical
- test_qwen_pr4426.py: 14 Qwen variants byte-identical
  (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3,
  Qwen3-Coder, QwQ, Qwen3-Guard-Gen)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard _validate_patched_template against read-only chat_template

If tokenizer.chat_template is a property or otherwise read-only, the
validation helper would crash with AttributeError when trying to
temporarily set the patched template. Catch the assignment failure and
return False (skip validation), and best-effort restore in the finally
block.

* Replace regex separator inference with render-diff; broaden repair to non-ChatML templates

The previous `_infer_assistant_separator` was a four-tier regex heuristic that
only worked on ChatML-shaped templates and forced a hard `<|im_start|>` /
`<|im_end|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or
Phi-3 template stripped of its generation-prompt block by a downstream tool
(LlamaFactory, Axolotl, etc.) would still warn-and-return even though the
structural shape is identical to the ChatML case the PR already handles.

This replaces the regex with `_derive_assistant_prefix_by_render`: render the
template with two dialogs that differ only in assistant content, then
`os.path.commonprefix` on the tails captures the exact assistant-turn prefix
the template emits. The template itself is ground truth, so non-ChatML shapes
work as long as the assistant block is a literal the template emits once per
message.

Three guards keep the derivation safe:
  A. both assistant renders extend the base render (no reordering);
  B. the divergence point is exactly the content-insertion site (sentinel
     follows the common prefix);
  C. a user-role cross-check: if a render with a user sentinel also emits
     the same prefix, role has no effect on output and we reject. A render
     failure on [user, user] (e.g. Gemma's `raise_exception` alternation
     check) is evidence that role matters; we accept.

Sentinels differ at character 0 so `commonprefix` cannot absorb them, and
trailing whitespace/comments after the last `{% endfor %}` are stripped
before probing (they would appear in base but not after the appended
assistant turn and break Guard A).

`_fix_chat_template` and `_repair_string_template` now thread an
`is_sharegpt` kwarg; `_fix_chat_template` retries once with
`is_sharegpt=True` if the first probe returns None (dual-probe fallback
for dict/list callers).

The ChatML `<|im_start|>` / `<|im_end|>` hard gate in Case 2 is dropped.
`_infer_assistant_separator` is deleted.

Verified via:
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3,
    Gemma, Phi-3 broken-template rows all repair FIX-OK)
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5
  - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover
    non-ChatML repair + probe failure modes)
  - tests/test_mistral_pr4426.py: 5/5 byte-identical
  - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST
    gate still rejects)
  - tests/hermes3_lora_pr4426.py reload: patched template ends with
    `<|im_start|>assistant\n`, inference returns sensible output.
  - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions,
    9 improvements.
  - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4,
    Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan

Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`):

Finding #1 [10/10]: dict/list variants now route through
`_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy`
adapter. Previously the dict/list branches called `_fix_chat_template`
directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`)
contract, the `no == yes` diagnostic, broken-existing-block detection,
and `_validate_patched_template` guard. The proxy swaps
`base.chat_template` to the variant string before each
`apply_chat_template` call so tokenizer globals (`bos_token`, custom
filters, `raise_exception`) remain available; if the base is read-only
it falls back to isolated Jinja rendering.

Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the
`If` body to contain at least one `Output` node (a new
`_if_body_emits_content` helper walks descendants). This distinguishes a
real generation-prompt block from a header guard like
`{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}`
(body contains only `Assign`) which references the name but emits
nothing. Also dropped a now-redundant `"add_generation_prompt" not in
scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded
templates still get repaired.

Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with
equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}`
tokens. This prevents a trailing comment containing those tokens from
being picked as the real end tag. Positions in the padded string map 1:1
to positions in the original template.

Tests:
  - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode
    dict variant, T20 header-guard repair, T21 comment-endfor trap
    added; T4/T5 stubs updated with a working apply_chat_template
    that routes through Jinja).
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged.
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5.
  - tests/test_mistral_pr4426.py: 5/5 byte-identical.
  - tests/test_qwen_pr4426.py: 14/14 byte-identical.
  - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline.
  - Phase 3 Hermes-3 broken-LoRA reload: inference still returns
    `'The answer to the equation 2+2 is 4.'`.
  - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real
    stripped templates: probe still derives the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tighten comments in chat-template helpers

Pure comment minimization across `_find_end_position`,
`_has_add_generation_prompt_block`, `_if_body_emits_content`,
`_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2,
and `_VariantTokenizerProxy`. No behavior change; same intent,
fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix
still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates

Three real bugs from the 10-agent Opus review:

1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare
   `jinja2.Environment`. The probe renders at model-load time (before
   the user calls `apply_chat_template`), so it was a new eager
   code-execution surface that the base HF tokenizer loading does not
   have. SandboxedEnvironment blocks attribute-chain exploits at
   negligible cost.

2. `_repair_string_template` now tries validation with both
   `is_sharegpt=False` and `is_sharegpt=True`. Previously, when
   `_fix_chat_template` internally fell back to the other schema via
   its dual-probe, the outer validation still used the caller's
   original `is_sharegpt` -- rendering with the wrong message keys and
   spuriously dropping a valid repair.

3. `_has_add_generation_prompt_block` now skips `If` nodes whose test
   is a `Not` expression. A negated gate like
   `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when
   agp=False, so its emitting body is not a generation block -- but the
   old code counted any Name reference regardless of polarity.

Cleanup: removed unused `self._label`, added `\r` escape in
generation-block literal, switched variant labels to `!r` formatting,
removed redundant `import os as _os`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jinja2.sandbox import and sandbox proxy fallback

Two critical findings from the 20-reviewer pass:

1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`,
   not sandboxed. All 20 reviewers independently reproduced marker-file
   creation via `cycler.__init__.__globals__['os'].system(...)` during
   `fix_chat_template()`. Fixed: fallback now uses
   `from jinja2.sandbox import SandboxedEnvironment`.

2. [14/20] The render-diff probe did `import jinja2` then referenced
   `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a
   submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6.
   This caused `AttributeError` (swallowed by `except Exception`),
   making the entire Case 2 repair path silently return None in a clean
   process. The 6 reviewers who saw it work had `jinja2.sandbox`
   pre-imported by an earlier module in their process. Fixed: both the
   probe and the proxy fallback now use
   `from jinja2.sandbox import SandboxedEnvironment`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Hermes Lora chat test bug: does not have a {% if add_generation_prompt %} for generation purposes

2 participants