BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150) by danielhanchen · Pull Request #38 · unslothai/unsloth-staging-1

danielhanchen · 2026-04-11T11:32:50Z

Staging mirror of unslothai#4426

Original PR: unslothai#4426
Author: kimimgo

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Problem

ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with {% endfor %} and have no {% if add_generation_prompt %} block. This causes
fix_chat_template() to raise:

RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.

Root Cause

_fix_chat_template() only patches templates where content ({{ ... }}) follows
{% endfor %}. When the template simply ends after {% endfor %} (empty suffix),
the function returns the template unchanged, and the caller raises RuntimeError.

Fix

Add an elif branch in _fix_chat_template() that detects ChatML templates
(containing <|im_start|>) with an empty suffix after endfor, and appends
the standard ChatML generation prompt block:

{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.

…on_prompt (unslothai#4150) ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA training may end with {% endfor %} and have no {% if add_generation_prompt %} block. The existing _fix_chat_template only handled the case where content ({{ ... }}) followed endfor, not the case where the template simply ended. Add an elif branch that detects ChatML templates (containing <|im_start|>) with an empty suffix after endfor, and appends the standard ChatML generation prompt block. Fixes unslothai#4150

…t prefix Address review feedback: Phi-4 uses <|im_sep|> instead of newline after role. Extract the separator from the template pattern (message['role'] + 'SEPARATOR') to generate the correct assistant prefix for each ChatML variant.

for more information, see https://pre-commit.ci

danielhanchen · 2026-04-11T11:33:42Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the _fix_chat_template function in unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. The implementation attempts to dynamically extract model-specific separators using regex. Review feedback identifies a redundant import re statement and points out that the current regex is too restrictive, failing to handle templates with double quotes, dot notation, or hardcoded role strings like those found in Phi-4.

gemini-code-assist · 2026-04-11T11:39:07Z

+        import re
+
+        sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template)
+        separator = sep_match.group(1) if sep_match else "\\n"


There are two issues in this block:

Redundant Import: import re is unnecessary here as re is already imported at the top of the file (line 18).

Regex Robustness: The current regex message\['role'\]\s*\+\s*'([^']*)' is too restrictive. It fails to handle templates using double quotes (message["role"]), dot notation (message.role), or hardcoded role strings (common in ChatML templates like Phi-4, which the PR specifically aims to support).

For example, in phi4_template, the role is hardcoded as {{'<|im_start|>assistant<|im_sep|>' ... }}. The current regex will return None, causing the separator to default to \n, which is incorrect for Phi-4 (should be <|im_sep|>).

# Extract the model-specific separator after role (e.g. '\n' for # Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding. sep_match = re.search(r"<\|im_start\|>(?:assistant|user|system)([^'\"{}|]*)", chat_template) if not sep_match: sep_match = re.search(r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*['\"]([^'\"]*)['\"]", chat_template) separator = sep_match.group(1) if sep_match else "\\n"

danielhanchen

Thank you for the PR! The goal of this PR is to fix a RuntimeError in _fix_chat_template() when ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training end at {% endfor %} with no {% if add_generation_prompt %} block (closes GH#4150). As a summary, this PR adds a new elif branch in _fix_chat_template() that detects empty-suffix templates containing <|im_start|>, extracts the role separator via regex, and appends a synthetic {% if add_generation_prompt %}{{ '<|im_start|>assistant<SEP>' }}{% endif %} block.

We ran 11 parallel reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Below is the consolidated table, sorted by consensus. The core repair path is sound and fixes the Hermes case, but the separator inference is too narrow and the regex scope is too restrictive.

Reviewers	Severity	Finding
10/11	High	Separator inference only matches `message['role'] + '...'` (single-quoted, `+` concatenation). Hardcoded-role ChatML families (e.g. Phi-4 style `<\|im_sep\|>`) fall back to `\n`, so the fix silently produces `<\|im_start\|>assistant\n` instead of the correct `<\|im_start\|>assistant<\|im_sep\|>`. The PR description explicitly names Phi-4 as a target, so this is a regression from a loud `RuntimeError` to a silently malformed prompt.
5/11	Medium	Regex only matches single-quoted templates (`message['role']`); templates using `message["role"]` or `message.role` silently fall back to `\n`, losing correct separators for double-quoted / dot-access variants.
3/11	Low	`import re` inside `_fix_chat_template` is redundant -- `re` is already imported at module top (line 18). Harmless but inconsistent with the rest of the file.
2/11	Low	The `elif` guard `"<\|im_start\|>" in chat_template` is overly broad and could false-positive on templates that only mention the token in a Jinja `{# comment #}` or a string comparison. Tightening to additionally require `"<\|im_end\|>" in chat_template` or an output-context match would be safer.
1/11	Low	Generated Jinja block uses single-quoted string literal `'<\|im_start\|>assistant' + separator + '`; if an adversarial template ever contained a single quote in the separator, the injected Jinja would fail to parse at render time.

Disagreements

Sonnet subagent 1 called the PR "safe to merge as-is" because no real-world model on HF currently ships a broken ChatML template without add_generation_prompt. Sonnet subagent 2 confirmed that fetching the actual NousResearch/Hermes-3-Llama-3.1-8B, unsloth/phi-4, and Qwen/Qwen2.5-7B-Instruct tokenizers shows they all already have the generation prompt block, so the new elif branch never fires for the stock tokenizers. However, the other 9 reviewers correctly note that the whole point of the PR is to handle saved-after-LoRA-training templates where the block can be missing, and the Phi-4 regression path is concrete: issue unslothai#4150 itself arose from a saved Hermes LoRA, not a stock tokenizer.
One reviewer proposed hard-failing instead of patching when the separator cannot be confidently inferred, to preserve the previous explicit RuntimeError. The majority instead prefer extending the regex so Phi-4 etc. are also repaired correctly.

Concrete fix (derived from reviewer consensus)

Replace the elif body with a stronger extractor that checks the assistant literal first, then role-concatenation with both quote styles and .role attribute access, and only then falls back to \n:

    elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
        # GH#4150: ChatML-style templates (Hermes, Phi-4, Magnum, etc.) that
        # end at {% endfor %} with no add_generation_prompt block. Infer the
        # model-specific separator after "assistant" from the template itself.
        # Prefer an explicit assistant literal ("<|im_start|>assistant<sep>"),
        # then fall back to `message['role'] + '<sep>'` in either quote style,
        # then to the common ChatML newline.
        assistant_match = (
            re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template)
            or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template)
        )
        role_match = re.search(
            r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1",
            chat_template,
        )
        if assistant_match is not None:
            separator = assistant_match.group(1)
        elif role_match is not None:
            separator = role_match.group(2)
        else:
            separator = "\\n"

        generation_block = (
            "{%" + dash + " if add_generation_prompt %}"
            "{{ '<|im_start|>assistant" + separator + "' }}"
            "{%" + dash + " endif %}"
        )
        chat_template = chat_template[: where + len(chosen_end)] + generation_block

Also drop the in-function import re (module-level re import already exists at line 18).

See inline comments for per-finding detail and suggestion blocks.

danielhanchen · 2026-04-11T11:47:04Z

+        import re
+
+        sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template)
+        separator = sep_match.group(1) if sep_match else "\\n"


[10/11 reviewers] High - The separator inference is too narrow and introduces a silent regression for Phi-4 / <|im_sep|> style ChatML templates. The current regex message\['role'\]\s*\+\s*'([^']*)' only matches the + concat + single-quote shape used by Hermes/Magnum. For templates that hardcode the role (Phi-4 uses literal '<|im_start|>assistant<|im_sep|>'), sep_match returns None, the separator defaults to \n, and the fix produces <|im_start|>assistant\n instead of <|im_start|>assistant<|im_sep|>. This converts a loud RuntimeError into a silently malformed prompt, which is a regression from the pre-PR state. The PR description explicitly names Phi-4 as a target, so this case needs to be handled. Suggested fix: prefer an explicit assistant literal, then role-concat with both quote styles and .role attribute, then fall back to \n.

Suggested change

separator = sep_match.group(1) if sep_match else "\\n"

# Prefer the explicit assistant literal in the template, then

# role-concatenation in either quote style, then fall back to

# the common ChatML newline separator.

assistant_match = (

re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template)

or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template)

)

role_match = re.search(

r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1",

chat_template,

)

if assistant_match is not None:

separator = assistant_match.group(1)

elif role_match is not None:

separator = role_match.group(2)

else:

separator = "\\n"

danielhanchen · 2026-04-11T11:47:05Z

+        # Extract the model-specific separator after role (e.g. '\n' for
+        # Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding.
+        import re
+


[3/11 reviewers] Low - import re inside the function body is redundant because re is already imported at module top (line 18 of this file). It does not cause a functional bug thanks to Python's module cache, but it is inconsistent with the rest of the file and adds noise. Please remove it.

Suggested change

danielhanchen · 2026-04-11T11:47:05Z


        chat_template = chat_template[: where + len(chosen_end)] + after_endfor
+
+    elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:


[2/11 reviewers] Low - The guard "<|im_start|>" in chat_template is overly broad: a template that merely references <|im_start|> inside a Jinja {# comment #} or in an equality check ({% if token == '<|im_start|>' %}) without actually emitting ChatML would also trigger this branch. A safer condition is to require both tokens and preferably that they appear in output {{ ... }} contexts.

Suggested change

elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:

elif (

after_endfor.strip() == ""

and "<|im_start|>" in chat_template

and "<|im_end|>" in chat_template

):

danielhanchen · 2026-04-11T11:47:05Z

+        assistant_prefix = "<|im_start|>assistant" + separator
+        generation_block = (
+            "{%" + dash + " if add_generation_prompt %}"
+            "{{ '" + assistant_prefix + "' }}"


[1/11 reviewers] Low - The generated Jinja block wraps the assistant prefix in a single-quoted Jinja string literal. If the extracted separator ever contained a single quote, the injected block would produce invalid Jinja ({{ '<|im_start|>assistant'bad' }}), failing at inference time. None of the known ChatML separators (\n, <|im_sep|>, <|end_header_id|>) contain single quotes, but a minimal hardening is cheap: escape any literal single quotes in separator before interpolation, or use double quotes for the Jinja literal. This is a low-priority defensive hardening.

Broaden the empty-suffix ChatML repair branch so it correctly handles hardcoded-role templates (Phi-4 style) and double-quoted / dot-access variants, rather than silently falling back to '\n' for everything that is not `message['role'] + '...'`. - Prefer an explicit assistant literal (`'<|im_start|>assistant<sep>'`) so Phi-4 templates pick up `<|im_sep|>` instead of `\n`. - Accept both single- and double-quoted role concatenation, plus `message.role` attribute access. - Guard the branch on `<|im_end|> in chat_template` and `add_generation_prompt not in chat_template` to avoid false positives on templates that merely mention `<|im_start|>` in a comment, and to make the function idempotent if called on an already-fixed template. - Drop the redundant in-function `import re` (module-level import at line 18 already exists). - Emit the generated Jinja block with a double-quoted string literal so a separator containing a single quote cannot break the injected block.

danielhanchen · 2026-04-11T11:51:02Z

/gemini review

gemini-code-assist

Code Review

This pull request updates unsloth/tokenizer_utils.py to automatically append an add_generation_prompt block to ChatML-style templates when it is missing. The implementation uses regex to infer the assistant separator from the existing template structure. Review feedback suggests using a literal newline character for the fallback separator to ensure correct Jinja rendering and recommends refining the role-matching regex to specifically target the assistant role for better precision.

gemini-code-assist · 2026-04-11T11:56:34Z

+        elif role_match is not None:
+            separator = role_match.group(2)
+        else:
+            separator = "\\n"


The fallback separator is set to a literal string \\n (backslash followed by 'n'). While Jinja2 string literals do interpret \n as a newline character, it might be more robust and consistent with the regex matches (which likely return actual newline characters) to use a literal newline character \n as the fallback, or ensure that the constructed Jinja block handles the separator correctly in all edge cases. If the intention is to provide a newline, using "\n" is generally preferred for clarity in Python code constructing Jinja templates.

Suggested change

separator = "\\n"

separator = "\n"

gemini-code-assist · 2026-04-11T11:56:34Z

+            chat_template,
+        )
+        role_match = re.search(
+            r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",


The regex for role_match is quite broad as it matches any role concatenation (e.g., user, system, or assistant). While ChatML templates typically use the same separator for all roles, there is a small risk of inferring the wrong separator if the template uses different formatting for different roles. Consider refining the regex to specifically look for the assistant role if possible, although the current heuristic is likely sufficient for most ChatML-style templates.

danielhanchen

Loop 2 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini). Note: the local gemini_parallel_review.py (1d) still reports an empty review body due to a workspace path mismatch during its internal tool invocation, so it does not contribute findings this round.

Verdict breakdown: 7/7 reviewer.py APPROVE, Sonnet A MERGE, Sonnet B BLOCK (3 new bugs), Sonnet C MERGE, Gemini hosted 2 Medium comments (one duplicating Sonnet B's role-regex concern). The separator-inference robustness improvements are effective (Phi-4, .role, double-quoted all now work), but one substantive new issue emerged from Sonnet B that was independently reinforced by the hosted Gemini review.

Reviewers	Severity	Finding
2/11	High	Bug B: `role_match` picks first occurrence, not the assistant-role separator. When a template branches on role (e.g. `system` uses `\n` but `user`/`assistant` use `<\|im_sep\|>`), `re.search` returns the `system` separator and the generated block produces `<\|im_start\|>assistant\n` instead of `<\|im_start\|>assistant<\|im_sep\|>`. Reproduced concretely with a Phi-4-mini-style `if/else` template that is now silently mispatched.
1/11	Medium	Bug A: `assistant_match` regex can match inside a `{# Jinja comment #}` block. A template that contains a documentation-style comment like `{# example: '<\|im_start\|>assistant<WRONG>' #}` will have `WRONG` extracted as the separator. Adversarial/unlikely but trivially fixable by stripping comments before regex.
1/11	Medium	Bug C: `"add_generation_prompt" not in chat_template` guard fires on comments. A template with `{# ignores add_generation_prompt #}` passes the substring check and the `elif` is skipped, leaving the template unrepaired. Also adversarial, also trivially fixable by scrubbing comments first.
1/11	Low	Fallback `separator = "\\n"` is a two-character Python string `\n`. Hosted Gemini asked whether this should be `"\n"` (real newline) for clarity. This is actually correct as-is: the separator is interpolated into a double-quoted Jinja literal, so `"\\n"` becomes `\n` in the template source, which Jinja2 then renders as an actual newline. Using a literal newline would change the generated template source but produce the same runtime output. Keeping as-is for source-readability.
0/11	--	7/7 `reviewer.py --slow` workers, Sonnet A, and Sonnet C independently verified no regressions on pre-existing paths, on `{%- endfor %}` dashed variants, on `message.role` attribute access, on double-quoted templates, and on 5 real `transformers` tokenizers (Hermes-3, Phi-4, Qwen2.5, Magnum-v2, Llama-3.1) which all already ship `add_generation_prompt` blocks and therefore never reach the new branch.

Disagreements

Sonnet A and Sonnet C rated the Jinja-comment edge cases as "purely theoretical, no known real tokenizer triggers them, safe to merge." Sonnet B rated the role-branch regression as "High / BLOCK." The hosted Gemini independently flagged the same role_match regex as too broad, which gives Bug B multiple-source support even though its reviewer count is 2/11. The fix for all three bugs is small and low-risk, so we are applying it in the same loop rather than deferring.

Fix to be committed

Strip Jinja {# ... #} comments before all regex / substring checks (fixes Bugs A and C).
Collect every role + sep match with re.finditer; if they all agree, use that separator; if they disagree, prefer <|im_sep|> when it appears in the template, otherwise fall back to the ChatML \n default (fixes Bug B).
Keep the existing structure: explicit <|im_start|>assistant<sep> literal is still preferred when present.

See inline comments for per-finding detail and suggestion blocks.

danielhanchen · 2026-04-11T12:05:24Z

+        assistant_match = re.search(
+            r"""(['"])<\|im_start\|>assistant([^'"]*)\1""",
+            chat_template,
+        )


[2/11 reviewers] High - Bug B: re.search returns only the first message['role'] + '<sep>' match in the template. For templates that branch on role (e.g. system uses \n but user/assistant use <|im_sep|>), this picks the system separator, so the injected block produces <|im_start|>assistant\n instead of the correct <|im_start|>assistant<|im_sep|>. Reproduced with a Phi-4-mini-style role-branched template: pre-fix rendered <|im_start|>assistant\n, post-fix rendered <|im_start|>assistant<|im_sep|> (correct). The hosted Gemini reviewer also flagged the same concern independently as 'role regex is too broad'.

Suggested change

)

role_seps = [

m.group(2)

for m in re.finditer(

r"""message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1""",

scrubbed,

)

]

unique_role_seps = list(dict.fromkeys(role_seps))

if assistant_match is not None:

separator = assistant_match.group(2)

elif len(unique_role_seps) == 1:

separator = unique_role_seps[0]

elif "<|im_sep|>" in scrubbed:

separator = "<|im_sep|>"

else:

separator = "\\n"

danielhanchen · 2026-04-11T12:05:24Z

+        # Infer the model-specific separator after "assistant" from the
+        # template itself. Prefer an explicit assistant literal
+        # (`'<|im_start|>assistant<sep>'`), then role-concatenation in either
+        # quote style and with either `message['role']` or `message.role`


[1/11 reviewers] Medium - Bug A: the assistant_match regex runs over the raw template string, so a Jinja {# comment #} that happens to contain an example like {# example: '<|im_start|>assistant<WRONG>' #} will have WRONG extracted as the separator and injected into the generated block. Strip Jinja comments before regex / substring checks to close this (and Bug C below) cleanly.

danielhanchen · 2026-04-11T12:05:24Z


        chat_template = chat_template[: where + len(chosen_end)] + after_endfor
+
+    elif (


[1/11 reviewers] Medium - Bug C: the guard "add_generation_prompt" not in chat_template is a raw substring check. A template with a comment like {# This template ignores add_generation_prompt #} will match the substring and the elif will be skipped, leaving the template unrepaired even though the real Jinja block is absent. Fix by scrubbing comments first and checking against the scrubbed string.

Suggested change

elif (

elif after_endfor.strip() == "":

# Strip Jinja `{# ... #}` comments before any regex /

# substring check so the guard and separator inference

# cannot be fooled by tokens that appear only in comments.

scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL)

if (

"<|im_start|>" in scrubbed

and "<|im_end|>" in scrubbed

and "add_generation_prompt" not in scrubbed

):

danielhanchen · 2026-04-11T12:05:24Z

+        role_match = re.search(
+            r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
+            chat_template,
+        )


[1/11 reviewers] Low - Hosted Gemini suggested changing the fallback separator = "\\n" to separator = "\n". These are not equivalent here: the separator is interpolated into a Jinja string literal, so "\\n" becomes \n in the emitted template source, which Jinja renders as a real newline. Using a literal \n would embed a raw newline in the template source, which is uglier but functionally identical. No change needed; noting here for transparency.

…olution Follow-up to the earlier PR unslothai#4426 fix for ChatML empty-suffix templates. Second round of review surfaced three additional issues in the new elif branch; this commit addresses all three. - Bug B (High): re.search returned the first `message['role'] + '<sep>'` match only, so templates that branch per-role (e.g. Phi-4-mini style where `system` uses `\n` but `user`/`assistant` use `<|im_sep|>`) got the wrong separator and the injected block became `<|im_start|>assistant\n` instead of `<|im_start|>assistant<|im_sep|>`. Fix: scan every role concatenation with re.finditer, use the single separator if all agree, otherwise prefer `<|im_sep|>` when present in the template, otherwise fall back to `\n`. - Bug A (Medium): the assistant-literal regex could match inside a Jinja `{# comment #}` block, so a template containing documentation like `{# example: '<|im_start|>assistant<WRONG>' #}` would silently extract `WRONG` as the separator. - Bug C (Medium): the guard `"add_generation_prompt" not in chat_template` is a raw substring check, so a template with a comment like `{# This template ignores add_generation_prompt #}` would match the substring and the elif would be skipped, leaving the template unrepaired. Fix for Bugs A and C: strip Jinja `{# ... #}` comments once before any regex or substring check in this branch. Verified against: - 10-case template simulation (Hermes, Phi-4, double-quoted, dot-access, dashed, trailing-expr, Llama-3 control, already-fixed, pre-PR compare) - 5 new bug reproducers (Bug A, Bug B, Bug C, idempotency, Hermes basic) All 15 cases pass.

danielhanchen · 2026-04-11T12:06:10Z

/gemini review

gemini-code-assist

Code Review

This pull request updates unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. It introduces logic to infer the assistant separator from the template while ignoring Jinja comments. A review comment suggests appending the original trailing whitespace when inserting the new generation block to maintain consistent formatting.

gemini-code-assist · 2026-04-11T12:07:51Z

+            chat_template = (
+                chat_template[: where + len(chosen_end)] + generation_block
+            )


When appending the generation_block, it is recommended to include the original after_endfor content (which is verified to be only whitespace in this branch). This preserves any trailing whitespace or newlines that were present at the end of the original template, maintaining consistent file formatting and avoiding unnecessary changes to the template's suffix.

Suggested change

chat_template = (

chat_template[: where + len(chosen_end)] + generation_block

)

chat_template = (

chat_template[: where + len(chosen_end)] + generation_block + after_endfor

)

danielhanchen

Loop 3 review: 11 reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Local gemini_parallel_review.py (1d) still returns an empty review body (tool path issue unrelated to this PR) and does not contribute.

Verdict breakdown:

reviewer.py: 6/7 REQUEST_CHANGES, 1/7 APPROVE
Sonnet A (final verification pass): MERGE
Sonnet B (end-to-end transformers integration): MERGE (all 5 real-tokenizer tests pass)
Sonnet C (adversarial regex hunt): MERGE (only theoretical edge cases, not reachable)
Hosted Gemini: 1 Medium comment requesting after_endfor preservation

Reviewers	Severity	Finding
8/11	High	Trailing Jinja comment after `{% endfor %}` still bypasses the new branch. All 7 `reviewer.py --slow` workers independently constructed and reproduced a template ending like `{% endfor %}{# trailing comment #}`: `after_endfor.strip() == ""` returns False because the comment is non-whitespace, so the elif is skipped and `fix_chat_template` still raises. Hosted Gemini reached the same issue from a different angle (asking for `after_endfor` preservation in the output). Fix: scrub Jinja comments from `after_endfor` before the emptiness check.
1/11	Low	Sonnet A: `assistant_match` can still match an assistant literal inside a `{% set _helper = '...' %}` block. Triggers only when a template stores an unused helper variable with a different separator than the actual assistant branch. No known tokenizer uses this pattern; the `role_seps` fallback path (which Sonnet B confirmed fires correctly in practice) already handles mixed-separator role-branched templates. Not acted on.
1/11	Low	*Sonnet C: `\{#.?#\}` regex would match `{# ... #}` inside a Jinja string literal.** E.g. `{% set x = '{# fake #}' %}` would have its content scrubbed. No real tokenizer stores a literal `{#` inside a string; Sonnet C marked this as unreachable theoretical. Not acted on.
1/11	Info	Hosted Gemini: preserve `after_endfor` by appending it after the generation block. Applied partially: we use the scrubbed `after_endfor` for the emptiness check (which is what mattered for the 7/7 finding), but we drop the (whitespace-and-comments-only) trailing content from the output so that `apply_chat_template(add_generation_prompt=True)` does not end with stray trailing whitespace. This matches the semantics of the adjacent first-branch repair path.
1/7	Contradicted	`reviewer.py` run 1 flagged a syntax regression in `generation_block` string construction, claiming it produced `{{% ... %}}` / doubled delimiters and triggered `TemplateSyntaxError`. Contradicted by the other 6/7 reviewers and by both Sonnet B's real-tokenizer integration pass and Sonnet C's adversarial parse test, which all confirmed the generated Jinja compiles and renders cleanly. Not acted on.

Disagreements

reviewer.py run 1 claimed the generated Jinja is malformed; the other 6 runs plus Sonnet B (real transformers.AutoTokenizer.apply_chat_template) and Sonnet C (direct jinja2.Environment().from_string(...) compile check) all showed clean parse/render. We trust the independent parse evidence over a single-reviewer report.
Sonnet A, B, and C all voted MERGE even with the trailing-comment edge case present. The reviewer.py pool was unanimous that the trailing-comment case should be fixed. The fix is small (one regex), so we are applying it in this loop rather than deferring.

Fix to be committed

Rewrite the elif guard from after_endfor.strip() == "" to
re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == "", so
a template ending with {% endfor %}{# trailing comment #} enters the new
repair path rather than falling through to RuntimeError.
Keep the output chat_template = chat_template[: where + len(chosen_end)] + generation_block
(drop trailing whitespace/comments) so that apply_chat_template(add_generation_prompt=True) does not
emit stray trailing whitespace after the generation prefix. This matches the adjacent first-branch behavior.

Verified against the full 20-case test matrix (10 original + 5 Bugs A/B/C + 5 new trailing-comment cases), all passing.

See inline comments for per-finding detail.

danielhanchen · 2026-04-11T12:18:02Z


        chat_template = chat_template[: where + len(chosen_end)] + after_endfor
+
+    elif after_endfor.strip() == "":


[8/11 reviewers] High - The emptiness check after_endfor.strip() == "" runs on the raw after_endfor string, so a template ending with {% endfor %}{# trailing comment #} has after_endfor = "{# trailing comment #}", which is non-empty, and the new repair branch never fires. All 7 reviewer.py --slow workers constructed this exact reproducer independently, and hosted Gemini reached the same issue from a different angle. Fix: scrub Jinja comments from after_endfor before the emptiness check so comment-only suffixes enter the new branch.

Suggested change

elif after_endfor.strip() == "":

elif (

re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == ""

):

danielhanchen · 2026-04-11T12:18:02Z

+                "{%" + dash + " endif %}"
+            )
+            chat_template = (
+                chat_template[: where + len(chosen_end)] + generation_block


[1/11 reviewers] Info - Hosted Gemini asked to preserve after_endfor by appending it to the generation block in the output. We are accepting the scrub-for-detection half of this suggestion but rejecting the preserve-in-output half: at this point after_endfor contains only whitespace and/or Jinja comments (verified by the scrubbed emptiness check above), and appending it would cause apply_chat_template(add_generation_prompt=True) to emit trailing whitespace after the generation prefix (e.g. ...<|im_start|>assistant\n \n\n ), which the model would see as part of the prompt. The first repair branch also does not preserve trailing whitespace, so this keeps the two branches symmetric. Keeping the output as-is.

danielhanchen · 2026-04-11T12:18:02Z

+            #      '<|im_sep|>' for user/assistant), prefer '<|im_sep|>' when
+            #      it appears in the template, otherwise fall back to the
+            #      ChatML newline default.
+            assistant_match = re.search(


[1/11 reviewers] Low - Sonnet A noted that assistant_match could match a '<|im_start|>assistant<sep>' literal stored in an unused {% set _helper = '...' %} block with a separator different from the actual assistant role branch. This is only reachable on synthetic templates that define a helper variable for documentation purposes; no known real tokenizer does this. The role_seps fallback path (which Sonnet B's real-tokenizer integration run confirmed fires correctly for mixed-separator role-branched templates) already handles the realistic cases. Not acted on in this round.

danielhanchen · 2026-04-11T12:18:02Z

+        # by ChatML tokens or `add_generation_prompt` mentions that appear
+        # only inside a comment.
+        scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL)
+        if (


[1/11 reviewers] Low - Sonnet C noted that the Jinja comment scrub regex \{#.*?#\} would also match {# ... #} sequences that appear inside a Jinja string literal (e.g. {% set x = '{# fake #}' %}), silently stripping the string's content from the scrubbed copy. No real tokenizer template stores a literal {# inside a string, and Sonnet C explicitly classified this as unreachable theoretical. Not acted on.

Loop-3 review round (7/7 reviewer.py --slow workers + hosted Gemini) independently surfaced a trailing-Jinja-comment case that bypassed the new empty-suffix repair branch from PR unslothai#4426: templates ending like `{% endfor %}{# trailing comment #}` have `after_endfor = "{# ... #}"` which is non-empty, so the emptiness check fails and the template falls through to the RuntimeError. Fix: scrub Jinja `{# ... #}` comments from `after_endfor` before the emptiness check, so comment-only suffixes enter the new repair path. The body of the branch keeps its existing behavior of appending just the generation block (without re-appending the scrubbed `after_endfor`) so that `apply_chat_template(add_generation_prompt=True)` does not emit stray trailing whitespace after the generation prefix. This matches the semantics of the adjacent first-branch repair path. Verified against 20 test cases: 10 original template shapes, 5 Bug A/B/C reproducers from the loop-2 round, and 5 new trailing-comment reproducers. All pass.

danielhanchen · 2026-04-11T12:18:40Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances the _fix_chat_template function to support ChatML-style templates that end with a loop and lack an add_generation_prompt block. The implementation includes logic to infer assistant separators from the existing template structure. A review comment identifies an inconsistency where trailing whitespace and Jinja comments are discarded, suggesting they be preserved to match the behavior of other repair branches.

gemini-code-assist · 2026-04-11T12:21:17Z

+            # `after_endfor` contains only whitespace and/or Jinja comments
+            # at this point (verified above), so it carries no semantic
+            # content for rendering. Drop it and append the generation
+            # block directly after `endfor`, matching the behavior of the
+            # first repair branch.
+            chat_template = (
+                chat_template[: where + len(chosen_end)] + generation_block
+            )


The current implementation discards after_endfor, which may contain trailing whitespace or Jinja comments (e.g., {# end of template #}). While these are not semantically functional, discarding them is inconsistent with the first repair branch (lines 669-673), which preserves after_endfor by wrapping it. To maintain the original template's structure and preserve user comments, consider appending after_endfor after the generation_block. Note that the comment on line 735 is currently misleading as it claims to match the behavior of the first branch while doing the opposite.

# `after_endfor` contains only whitespace and/or Jinja comments # at this point (verified above). Append the generation block # and preserve the trailing content for consistency. chat_template = ( chat_template[: where + len(chosen_end)] + generation_block + after_endfor )

danielhanchen

Loop 4 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini).

Verdict breakdown:

reviewer.py: 5/7 APPROVE, 2/7 REQUEST_CHANGES
Sonnet A (final verification): MERGE, all 20 tests pass
Sonnet B (real transformers integration, 5 tokenizers): MERGE, 19/19 real-tokenizer tests pass including a stock unsloth/phi-4 backward-compat check
Sonnet C (adversarial regex hunt): MERGE with an optional 1-line defensive guard
Hosted Gemini: 1 Medium comment (duplicate of a loop-3 suggestion we already considered)

Reviewers	Severity	Finding
2/11	Low	**Empty separator from bare `'<
1/11	Info	Hosted Gemini: preserve `after_endfor` by appending it to the generation block. This is a re-raise of the loop-3 Gemini suggestion. We previously considered and rejected the preserve-in-output half because `after_endfor` at this point is whitespace-and-comments-only and appending it would cause `apply_chat_template(add_generation_prompt=True)` to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing code comment was misleading is valid (the first branch does preserve its `after_endfor`, but the content is semantic rather than whitespace). We are updating the comment to explain the asymmetry accurately without changing behavior.
1/7	Contradicted	`reviewer.py` run 2 once again claimed `generation_block` emits invalid Jinja `{{% ... %}}`. Contradicted by 6/7 runs + Sonnet A + Sonnet B (real `transformers.apply_chat_template` end-to-end) which all show the block parses and renders cleanly. Not acted on; the source string is literally `"{%" + dash + " if ..."`, which concatenates to `{% if ...`, not `{{% if ...`.
1/7	Contradicted	`reviewer.py` run 2 also claimed the comment scrub uses the wrong form (`{{# ... #}}` vs `{# ... #}`). Contradicted -- the source is `r"\{#.*?#\}"` which is the correct single-brace Jinja comment form. Verified by running the existing `verify_trailing_comment.py` test which explicitly constructs `{# ... #}` templates and confirms they are repaired.

Disagreements

Reviewer 2 has been consistently outvoted across loops on malformed-Jinja claims that are contradicted by every test that actually invokes Jinja2. We trust the Jinja2 parser over the single reviewer's static reading.
Hosted Gemini wants after_endfor preserved; the Sonnet subagents and the other reviewers show that preserving it produces semantically wrong output for whitespace-only suffixes. We are updating the explanatory comment rather than changing behavior.

Fix to be committed

Guard: if assistant_match captured an empty separator, fall through to the role-concat / <|im_sep|> / \n cascade rather than emitting a bare <|im_start|>assistant. One-line change: if assistant_match is not None and assistant_match.group(2):.
Reword the explanatory comment above chat_template = ... + generation_block to accurately describe why after_endfor is dropped in this branch while being wrapped in the first branch.

Verified against the full 23-case test matrix:

10 original template shapes
5 Bug A/B/C reproducers
5 trailing-comment reproducers
3 new empty-separator / bare-assistant-literal reproducers

All pass. Loop 4 complete. Ready to merge back to upstream.

See inline comments for per-finding detail.

danielhanchen · 2026-04-11T12:30:50Z

+                for m in re.finditer(
+                    r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
+                    scrubbed,
+                )


[2/11 reviewers] Low - Empty separator from bare '<|im_start|>assistant' literal. When a template splits the assistant prefix across concatenated string literals ({{ '<|im_start|>assistant' }}{{ '\n' }} or {{ '<|im_start|>assistant' + '\n' + ... }}), the regex captures an empty group, and the injected generation block becomes {{ "<|im_start|>assistant" }} with no trailing newline -- add_generation_prompt=True then renders a bare <|im_start|>assistant token that the model will not interpret correctly. No known real-world tokenizer uses this split-literal pattern, but the defensive guard is one line.

Suggested change

)

if assistant_match is not None and assistant_match.group(2):

danielhanchen · 2026-04-11T12:30:50Z

+            # `after_endfor` contains only whitespace and/or Jinja comments
+            # at this point (verified above), so it carries no semantic
+            # content for rendering. Drop it and append the generation
+            # block directly after `endfor`, matching the behavior of the


[1/11 reviewers] Info - Hosted Gemini re-raised the suggestion to append after_endfor to the output here to match the first repair branch. We previously considered this in loop 3 and rejected the output-preserve half because after_endfor is whitespace-and-comments-only at this point, and preserving it would cause apply_chat_template(add_generation_prompt=True) to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing comment was misleading is valid -- the first branch's after_endfor is semantic content (a {{ ... }} expression) rather than whitespace, so the two branches are intentionally asymmetric. Updating the comment to explain this without changing behavior.

danielhanchen · 2026-04-11T12:30:50Z

+            assistant_prefix = "<|im_start|>assistant" + separator
+            generation_block = (
+                "{%" + dash + " if add_generation_prompt %}"
+                '{{ "' + assistant_prefix.replace('"', '\\"') + '" }}'


[1/7, contradicted] Not acted on - reviewer.py run 2 claimed this line emits invalid Jinja {{% ... %}}. This is contradicted by 6/7 reviewer.py runs + Sonnet A + Sonnet B's real transformers.AutoTokenizer.apply_chat_template round-trip, all of which successfully parse and render the generated block. Static read of the source string "{%" + dash + " if add_generation_prompt %}" concatenates to {% if add_generation_prompt %}, not {{% ... %}}. The 6/7 consensus and live Jinja2 parser evidence is authoritative here.

Follow-up to PR unslothai#4426: reviewer.py run 6 and Sonnet C (adversarial regex hunt) independently found that the `assistant_match` regex can capture an empty separator when a template splits the assistant prefix across concatenated string literals, e.g. `{{ '<|im_start|>assistant' }}{{ '\n' }}...`. The injected generation block then becomes `{{ "<|im_start|>assistant" }}` with no trailing newline and the generation prompt is malformed. Fix: only accept a non-empty captured separator; otherwise fall through to the role-concat / `<|im_sep|>` / `\n` cascade so the fallback path produces a usable ChatML generation prefix in every case. One line. Also rewrite the explanatory comment above the generation-block append to accurately describe why `after_endfor` is dropped in this branch while being wrapped in the first branch. The first branch's `after_endfor` is a semantic `{{ ... }}` expression, whereas this branch's `after_endfor` is verified to contain only whitespace and comments that would render as stray trailing output. Verified against the full 23-case test matrix: - 10 original template shapes - 5 Bug A/B/C reproducers - 5 trailing Jinja comment reproducers - 3 new empty-separator / bare-assistant-literal reproducers All pass.

danielhanchen · 2026-04-11T12:32:07Z

Fixes pushed to original PR unslothai#4426. Closing staging copy.

kimimgo and others added 3 commits March 18, 2026 23:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

e84d889

for more information, see https://pre-commit.ci

github-code-quality Bot found potential problems Apr 11, 2026

View reviewed changes

Comment thread unsloth/tokenizer_utils.py Fixed

gemini-code-assist Bot reviewed Apr 11, 2026

View reviewed changes

danielhanchen commented Apr 11, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 11, 2026

View reviewed changes

danielhanchen commented Apr 11, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 11, 2026

View reviewed changes

danielhanchen commented Apr 11, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 11, 2026

View reviewed changes

danielhanchen commented Apr 11, 2026

View reviewed changes

danielhanchen closed this Apr 11, 2026


		chat_template = chat_template[: where + len(chosen_end)] + after_endfor

		elif after_endfor.strip() == "" and "<\|im_start\|>" in chat_template:

-        )
+            role_seps = [
+                m.group(2)
+                for m in re.finditer(
+                    r"""message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1""",
+                    scrubbed,
+                )
+            ]
+            unique_role_seps = list(dict.fromkeys(role_seps))
+            if assistant_match is not None:
+                separator = assistant_match.group(2)
+            elif len(unique_role_seps) == 1:
+                separator = unique_role_seps[0]
+            elif "<|im_sep|>" in scrubbed:
+                separator = "<|im_sep|>"
+            else:
+                separator = "\\n"


		chat_template = chat_template[: where + len(chosen_end)] + after_endfor

		elif (

-    elif (
+    elif after_endfor.strip() == "":
+        # Strip Jinja `{# ... #}` comments before any regex /
+        # substring check so the guard and separator inference
+        # cannot be fooled by tokens that appear only in comments.
+        scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL)
+        if (
+            "<|im_start|>" in scrubbed
+            and "<|im_end|>" in scrubbed
+            and "add_generation_prompt" not in scrubbed
+        ):


		chat_template = chat_template[: where + len(chosen_end)] + after_endfor

		elif after_endfor.strip() == "":

	)
	if assistant_match is not None and assistant_match.group(2):

Conversation

danielhanchen commented Apr 11, 2026

Staging mirror of unslothai#4426

Original description

Problem

Root Cause

Fix

Uh oh!

danielhanchen commented Apr 11, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Disagreements

Concrete fix (derived from reviewer consensus)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Disagreements

Fix to be committed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Disagreements

Fix to be committed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 11, 2026

Uh oh!