Skip to content

BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#38

Closed
danielhanchen wants to merge 7 commits into
mainfrom
pr-4426-head
Closed

BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#38
danielhanchen wants to merge 7 commits into
mainfrom
pr-4426-head

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Staging mirror of unslothai#4426

Original PR: unslothai#4426
Author: kimimgo

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Fixes unslothai#4150

Problem

ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with {% endfor %} and have no {% if add_generation_prompt %} block. This causes
fix_chat_template() to raise:

RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.

Root Cause

_fix_chat_template() only patches templates where content ({{ ... }}) follows
{% endfor %}. When the template simply ends after {% endfor %} (empty suffix),
the function returns the template unchanged, and the caller raises RuntimeError.

Fix

Add an elif branch in _fix_chat_template() that detects ChatML templates
(containing <|im_start|>) with an empty suffix after endfor, and appends
the standard ChatML generation prompt block:

{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.

kimimgo and others added 3 commits March 18, 2026 23:31
…on_prompt (unslothai#4150)

ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA
training may end with {% endfor %} and have no
{% if add_generation_prompt %} block. The existing _fix_chat_template
only handled the case where content ({{ ... }}) followed endfor,
not the case where the template simply ended.

Add an elif branch that detects ChatML templates (containing
<|im_start|>) with an empty suffix after endfor, and appends the
standard ChatML generation prompt block.

Fixes unslothai#4150
…t prefix

Address review feedback: Phi-4 uses <|im_sep|> instead of newline
after role. Extract the separator from the template pattern
(message['role'] + 'SEPARATOR') to generate the correct
assistant prefix for each ChatML variant.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Comment thread unsloth/tokenizer_utils.py Fixed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _fix_chat_template function in unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. The implementation attempts to dynamically extract model-specific separators using regex. Review feedback identifies a redundant import re statement and points out that the current regex is too restrictive, failing to handle templates with double quotes, dot notation, or hardcoded role strings like those found in Phi-4.

Comment thread unsloth/tokenizer_utils.py Outdated
Comment on lines +680 to +683
import re

sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template)
separator = sep_match.group(1) if sep_match else "\\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues in this block:

  1. Redundant Import: import re is unnecessary here as re is already imported at the top of the file (line 18).
  2. Regex Robustness: The current regex message\['role'\]\s*\+\s*'([^']*)' is too restrictive. It fails to handle templates using double quotes (message["role"]), dot notation (message.role), or hardcoded role strings (common in ChatML templates like Phi-4, which the PR specifically aims to support).

For example, in phi4_template, the role is hardcoded as {{'<|im_start|>assistant<|im_sep|>' ... }}. The current regex will return None, causing the separator to default to \n, which is incorrect for Phi-4 (should be <|im_sep|>).

        # Extract the model-specific separator after role (e.g. '\n' for
        # Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding.
        sep_match = re.search(r"<\|im_start\|>(?:assistant|user|system)([^'\"{}|]*)", chat_template)
        if not sep_match:
            sep_match = re.search(r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*['\"]([^'\"]*)['\"]", chat_template)
        separator = sep_match.group(1) if sep_match else "\\n"

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to fix a RuntimeError in _fix_chat_template() when ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training end at {% endfor %} with no {% if add_generation_prompt %} block (closes GH#4150). As a summary, this PR adds a new elif branch in _fix_chat_template() that detects empty-suffix templates containing <|im_start|>, extracts the role separator via regex, and appends a synthetic {% if add_generation_prompt %}{{ '<|im_start|>assistant<SEP>' }}{% endif %} block.

We ran 11 parallel reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Below is the consolidated table, sorted by consensus. The core repair path is sound and fixes the Hermes case, but the separator inference is too narrow and the regex scope is too restrictive.

Reviewers Severity Finding
10/11 High Separator inference only matches message['role'] + '...' (single-quoted, + concatenation). Hardcoded-role ChatML families (e.g. Phi-4 style <|im_sep|>) fall back to \n, so the fix silently produces <|im_start|>assistant\n instead of the correct <|im_start|>assistant<|im_sep|>. The PR description explicitly names Phi-4 as a target, so this is a regression from a loud RuntimeError to a silently malformed prompt.
5/11 Medium Regex only matches single-quoted templates (message['role']); templates using message["role"] or message.role silently fall back to \n, losing correct separators for double-quoted / dot-access variants.
3/11 Low import re inside _fix_chat_template is redundant -- re is already imported at module top (line 18). Harmless but inconsistent with the rest of the file.
2/11 Low The elif guard "<|im_start|>" in chat_template is overly broad and could false-positive on templates that only mention the token in a Jinja {# comment #} or a string comparison. Tightening to additionally require "<|im_end|>" in chat_template or an output-context match would be safer.
1/11 Low Generated Jinja block uses single-quoted string literal '<|im_start|>assistant' + separator + '; if an adversarial template ever contained a single quote in the separator, the injected Jinja would fail to parse at render time.

Disagreements

  • Sonnet subagent 1 called the PR "safe to merge as-is" because no real-world model on HF currently ships a broken ChatML template without add_generation_prompt. Sonnet subagent 2 confirmed that fetching the actual NousResearch/Hermes-3-Llama-3.1-8B, unsloth/phi-4, and Qwen/Qwen2.5-7B-Instruct tokenizers shows they all already have the generation prompt block, so the new elif branch never fires for the stock tokenizers. However, the other 9 reviewers correctly note that the whole point of the PR is to handle saved-after-LoRA-training templates where the block can be missing, and the Phi-4 regression path is concrete: issue unslothai#4150 itself arose from a saved Hermes LoRA, not a stock tokenizer.
  • One reviewer proposed hard-failing instead of patching when the separator cannot be confidently inferred, to preserve the previous explicit RuntimeError. The majority instead prefer extending the regex so Phi-4 etc. are also repaired correctly.

Concrete fix (derived from reviewer consensus)

Replace the elif body with a stronger extractor that checks the assistant literal first, then role-concatenation with both quote styles and .role attribute access, and only then falls back to \n:

    elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
        # GH#4150: ChatML-style templates (Hermes, Phi-4, Magnum, etc.) that
        # end at {% endfor %} with no add_generation_prompt block. Infer the
        # model-specific separator after "assistant" from the template itself.
        # Prefer an explicit assistant literal ("<|im_start|>assistant<sep>"),
        # then fall back to `message['role'] + '<sep>'` in either quote style,
        # then to the common ChatML newline.
        assistant_match = (
            re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template)
            or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template)
        )
        role_match = re.search(
            r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1",
            chat_template,
        )
        if assistant_match is not None:
            separator = assistant_match.group(1)
        elif role_match is not None:
            separator = role_match.group(2)
        else:
            separator = "\\n"

        generation_block = (
            "{%" + dash + " if add_generation_prompt %}"
            "{{ '<|im_start|>assistant" + separator + "' }}"
            "{%" + dash + " endif %}"
        )
        chat_template = chat_template[: where + len(chosen_end)] + generation_block

Also drop the in-function import re (module-level re import already exists at line 18).

See inline comments for per-finding detail and suggestion blocks.

Comment thread unsloth/tokenizer_utils.py Outdated
import re

sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template)
separator = sep_match.group(1) if sep_match else "\\n"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[10/11 reviewers] High - The separator inference is too narrow and introduces a silent regression for Phi-4 / <|im_sep|> style ChatML templates. The current regex message\['role'\]\s*\+\s*'([^']*)' only matches the + concat + single-quote shape used by Hermes/Magnum. For templates that hardcode the role (Phi-4 uses literal '<|im_start|>assistant<|im_sep|>'), sep_match returns None, the separator defaults to \n, and the fix produces <|im_start|>assistant\n instead of <|im_start|>assistant<|im_sep|>. This converts a loud RuntimeError into a silently malformed prompt, which is a regression from the pre-PR state. The PR description explicitly names Phi-4 as a target, so this case needs to be handled. Suggested fix: prefer an explicit assistant literal, then role-concat with both quote styles and .role attribute, then fall back to \n.

Suggested change
separator = sep_match.group(1) if sep_match else "\\n"
# Prefer the explicit assistant literal in the template, then
# role-concatenation in either quote style, then fall back to
# the common ChatML newline separator.
assistant_match = (
re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template)
or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template)
)
role_match = re.search(
r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1",
chat_template,
)
if assistant_match is not None:
separator = assistant_match.group(1)
elif role_match is not None:
separator = role_match.group(2)
else:
separator = "\\n"

Comment thread unsloth/tokenizer_utils.py Outdated
# Extract the model-specific separator after role (e.g. '\n' for
# Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding.
import re

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[3/11 reviewers] Low - import re inside the function body is redundant because re is already imported at module top (line 18 of this file). It does not cause a functional bug thanks to Python's module cache, but it is inconsistent with the rest of the file and adds noise. Please remove it.

Suggested change

Comment thread unsloth/tokenizer_utils.py Outdated

chat_template = chat_template[: where + len(chosen_end)] + after_endfor

elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/11 reviewers] Low - The guard "<|im_start|>" in chat_template is overly broad: a template that merely references <|im_start|> inside a Jinja {# comment #} or in an equality check ({% if token == '<|im_start|>' %}) without actually emitting ChatML would also trigger this branch. A safer condition is to require both tokens and preferably that they appear in output {{ ... }} contexts.

Suggested change
elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
elif (
after_endfor.strip() == ""
and "<|im_start|>" in chat_template
and "<|im_end|>" in chat_template
):

Comment thread unsloth/tokenizer_utils.py Outdated
assistant_prefix = "<|im_start|>assistant" + separator
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
"{{ '" + assistant_prefix + "' }}"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Low - The generated Jinja block wraps the assistant prefix in a single-quoted Jinja string literal. If the extracted separator ever contained a single quote, the injected block would produce invalid Jinja ({{ '<|im_start|>assistant'bad' }}), failing at inference time. None of the known ChatML separators (\n, <|im_sep|>, <|end_header_id|>) contain single quotes, but a minimal hardening is cheap: escape any literal single quotes in separator before interpolation, or use double quotes for the Jinja literal. This is a low-priority defensive hardening.

Broaden the empty-suffix ChatML repair branch so it correctly handles
hardcoded-role templates (Phi-4 style) and double-quoted / dot-access
variants, rather than silently falling back to '\n' for everything that
is not `message['role'] + '...'`.

- Prefer an explicit assistant literal (`'<|im_start|>assistant<sep>'`)
  so Phi-4 templates pick up `<|im_sep|>` instead of `\n`.
- Accept both single- and double-quoted role concatenation, plus
  `message.role` attribute access.
- Guard the branch on `<|im_end|> in chat_template` and
  `add_generation_prompt not in chat_template` to avoid false positives
  on templates that merely mention `<|im_start|>` in a comment, and to
  make the function idempotent if called on an already-fixed template.
- Drop the redundant in-function `import re` (module-level import at
  line 18 already exists).
- Emit the generated Jinja block with a double-quoted string literal so
  a separator containing a single quote cannot break the injected block.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates unsloth/tokenizer_utils.py to automatically append an add_generation_prompt block to ChatML-style templates when it is missing. The implementation uses regex to infer the assistant separator from the existing template structure. Review feedback suggests using a literal newline character for the fallback separator to ensure correct Jinja rendering and recommends refining the role-matching regex to specifically target the assistant role for better precision.

Comment thread unsloth/tokenizer_utils.py Outdated
elif role_match is not None:
separator = role_match.group(2)
else:
separator = "\\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback separator is set to a literal string \\n (backslash followed by 'n'). While Jinja2 string literals do interpret \n as a newline character, it might be more robust and consistent with the regex matches (which likely return actual newline characters) to use a literal newline character \n as the fallback, or ensure that the constructed Jinja block handles the separator correctly in all edge cases. If the intention is to provide a newline, using "\n" is generally preferred for clarity in Python code constructing Jinja templates.

Suggested change
separator = "\\n"
separator = "\n"

Comment thread unsloth/tokenizer_utils.py Outdated
chat_template,
)
role_match = re.search(
r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex for role_match is quite broad as it matches any role concatenation (e.g., user, system, or assistant). While ChatML templates typically use the same separator for all roles, there is a small risk of inferring the wrong separator if the template uses different formatting for different roles. Consider refining the regex to specifically look for the assistant role if possible, although the current heuristic is likely sufficient for most ChatML-style templates.

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop 2 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini). Note: the local gemini_parallel_review.py (1d) still reports an empty review body due to a workspace path mismatch during its internal tool invocation, so it does not contribute findings this round.

Verdict breakdown: 7/7 reviewer.py APPROVE, Sonnet A MERGE, Sonnet B BLOCK (3 new bugs), Sonnet C MERGE, Gemini hosted 2 Medium comments (one duplicating Sonnet B's role-regex concern). The separator-inference robustness improvements are effective (Phi-4, .role, double-quoted all now work), but one substantive new issue emerged from Sonnet B that was independently reinforced by the hosted Gemini review.

Reviewers Severity Finding
2/11 High Bug B: role_match picks first occurrence, not the assistant-role separator. When a template branches on role (e.g. system uses \n but user/assistant use <|im_sep|>), re.search returns the system separator and the generated block produces <|im_start|>assistant\n instead of <|im_start|>assistant<|im_sep|>. Reproduced concretely with a Phi-4-mini-style if/else template that is now silently mispatched.
1/11 Medium Bug A: assistant_match regex can match inside a {# Jinja comment #} block. A template that contains a documentation-style comment like {# example: '<|im_start|>assistant<WRONG>' #} will have WRONG extracted as the separator. Adversarial/unlikely but trivially fixable by stripping comments before regex.
1/11 Medium Bug C: "add_generation_prompt" not in chat_template guard fires on comments. A template with {# ignores add_generation_prompt #} passes the substring check and the elif is skipped, leaving the template unrepaired. Also adversarial, also trivially fixable by scrubbing comments first.
1/11 Low Fallback separator = "\\n" is a two-character Python string \n. Hosted Gemini asked whether this should be "\n" (real newline) for clarity. This is actually correct as-is: the separator is interpolated into a double-quoted Jinja literal, so "\\n" becomes \n in the template source, which Jinja2 then renders as an actual newline. Using a literal newline would change the generated template source but produce the same runtime output. Keeping as-is for source-readability.
0/11 -- 7/7 reviewer.py --slow workers, Sonnet A, and Sonnet C independently verified no regressions on pre-existing paths, on {%- endfor %} dashed variants, on message.role attribute access, on double-quoted templates, and on 5 real transformers tokenizers (Hermes-3, Phi-4, Qwen2.5, Magnum-v2, Llama-3.1) which all already ship add_generation_prompt blocks and therefore never reach the new branch.

Disagreements

  • Sonnet A and Sonnet C rated the Jinja-comment edge cases as "purely theoretical, no known real tokenizer triggers them, safe to merge." Sonnet B rated the role-branch regression as "High / BLOCK." The hosted Gemini independently flagged the same role_match regex as too broad, which gives Bug B multiple-source support even though its reviewer count is 2/11. The fix for all three bugs is small and low-risk, so we are applying it in the same loop rather than deferring.

Fix to be committed

  • Strip Jinja {# ... #} comments before all regex / substring checks (fixes Bugs A and C).
  • Collect every role + sep match with re.finditer; if they all agree, use that separator; if they disagree, prefer <|im_sep|> when it appears in the template, otherwise fall back to the ChatML \n default (fixes Bug B).
  • Keep the existing structure: explicit <|im_start|>assistant<sep> literal is still preferred when present.

See inline comments for per-finding detail and suggestion blocks.

Comment thread unsloth/tokenizer_utils.py Outdated
assistant_match = re.search(
r"""(['"])<\|im_start\|>assistant([^'"]*)\1""",
chat_template,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/11 reviewers] High - Bug B: re.search returns only the first message['role'] + '<sep>' match in the template. For templates that branch on role (e.g. system uses \n but user/assistant use <|im_sep|>), this picks the system separator, so the injected block produces <|im_start|>assistant\n instead of the correct <|im_start|>assistant<|im_sep|>. Reproduced with a Phi-4-mini-style role-branched template: pre-fix rendered <|im_start|>assistant\n, post-fix rendered <|im_start|>assistant<|im_sep|> (correct). The hosted Gemini reviewer also flagged the same concern independently as 'role regex is too broad'.

Suggested change
)
role_seps = [
m.group(2)
for m in re.finditer(
r"""message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1""",
scrubbed,
)
]
unique_role_seps = list(dict.fromkeys(role_seps))
if assistant_match is not None:
separator = assistant_match.group(2)
elif len(unique_role_seps) == 1:
separator = unique_role_seps[0]
elif "<|im_sep|>" in scrubbed:
separator = "<|im_sep|>"
else:
separator = "\\n"

Comment thread unsloth/tokenizer_utils.py Outdated
# Infer the model-specific separator after "assistant" from the
# template itself. Prefer an explicit assistant literal
# (`'<|im_start|>assistant<sep>'`), then role-concatenation in either
# quote style and with either `message['role']` or `message.role`
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Medium - Bug A: the assistant_match regex runs over the raw template string, so a Jinja {# comment #} that happens to contain an example like {# example: '<|im_start|>assistant<WRONG>' #} will have WRONG extracted as the separator and injected into the generated block. Strip Jinja comments before regex / substring checks to close this (and Bug C below) cleanly.


chat_template = chat_template[: where + len(chosen_end)] + after_endfor

elif (
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Medium - Bug C: the guard "add_generation_prompt" not in chat_template is a raw substring check. A template with a comment like {# This template ignores add_generation_prompt #} will match the substring and the elif will be skipped, leaving the template unrepaired even though the real Jinja block is absent. Fix by scrubbing comments first and checking against the scrubbed string.

Suggested change
elif (
elif after_endfor.strip() == "":
# Strip Jinja `{# ... #}` comments before any regex /
# substring check so the guard and separator inference
# cannot be fooled by tokens that appear only in comments.
scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL)
if (
"<|im_start|>" in scrubbed
and "<|im_end|>" in scrubbed
and "add_generation_prompt" not in scrubbed
):

Comment thread unsloth/tokenizer_utils.py Outdated
role_match = re.search(
r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
chat_template,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Low - Hosted Gemini suggested changing the fallback separator = "\\n" to separator = "\n". These are not equivalent here: the separator is interpolated into a Jinja string literal, so "\\n" becomes \n in the emitted template source, which Jinja renders as a real newline. Using a literal \n would embed a raw newline in the template source, which is uglier but functionally identical. No change needed; noting here for transparency.

…olution

Follow-up to the earlier PR unslothai#4426 fix for ChatML empty-suffix templates.
Second round of review surfaced three additional issues in the new
elif branch; this commit addresses all three.

- Bug B (High): re.search returned the first `message['role'] + '<sep>'`
  match only, so templates that branch per-role (e.g. Phi-4-mini style
  where `system` uses `\n` but `user`/`assistant` use `<|im_sep|>`) got
  the wrong separator and the injected block became
  `<|im_start|>assistant\n` instead of `<|im_start|>assistant<|im_sep|>`.
  Fix: scan every role concatenation with re.finditer, use the single
  separator if all agree, otherwise prefer `<|im_sep|>` when present in
  the template, otherwise fall back to `\n`.

- Bug A (Medium): the assistant-literal regex could match inside a
  Jinja `{# comment #}` block, so a template containing documentation
  like `{# example: '<|im_start|>assistant<WRONG>' #}` would silently
  extract `WRONG` as the separator.

- Bug C (Medium): the guard `"add_generation_prompt" not in chat_template`
  is a raw substring check, so a template with a comment like
  `{# This template ignores add_generation_prompt #}` would match the
  substring and the elif would be skipped, leaving the template
  unrepaired.

Fix for Bugs A and C: strip Jinja `{# ... #}` comments once before any
regex or substring check in this branch.

Verified against:
- 10-case template simulation (Hermes, Phi-4, double-quoted, dot-access,
  dashed, trailing-expr, Llama-3 control, already-fixed, pre-PR compare)
- 5 new bug reproducers (Bug A, Bug B, Bug C, idempotency, Hermes basic)
All 15 cases pass.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. It introduces logic to infer the assistant separator from the template while ignoring Jinja comments. A review comment suggests appending the original trailing whitespace when inserting the new generation block to maintain consistent formatting.

Comment on lines +729 to +731
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When appending the generation_block, it is recommended to include the original after_endfor content (which is verified to be only whitespace in this branch). This preserves any trailing whitespace or newlines that were present at the end of the original template, maintaining consistent file formatting and avoiding unnecessary changes to the template's suffix.

Suggested change
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block
)
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block + after_endfor
)

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop 3 review: 11 reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Local gemini_parallel_review.py (1d) still returns an empty review body (tool path issue unrelated to this PR) and does not contribute.

Verdict breakdown:

  • reviewer.py: 6/7 REQUEST_CHANGES, 1/7 APPROVE
  • Sonnet A (final verification pass): MERGE
  • Sonnet B (end-to-end transformers integration): MERGE (all 5 real-tokenizer tests pass)
  • Sonnet C (adversarial regex hunt): MERGE (only theoretical edge cases, not reachable)
  • Hosted Gemini: 1 Medium comment requesting after_endfor preservation
Reviewers Severity Finding
8/11 High Trailing Jinja comment after {% endfor %} still bypasses the new branch. All 7 reviewer.py --slow workers independently constructed and reproduced a template ending like {% endfor %}{# trailing comment #}: after_endfor.strip() == "" returns False because the comment is non-whitespace, so the elif is skipped and fix_chat_template still raises. Hosted Gemini reached the same issue from a different angle (asking for after_endfor preservation in the output). Fix: scrub Jinja comments from after_endfor before the emptiness check.
1/11 Low Sonnet A: assistant_match can still match an assistant literal inside a {% set _helper = '...' %} block. Triggers only when a template stores an unused helper variable with a different separator than the actual assistant branch. No known tokenizer uses this pattern; the role_seps fallback path (which Sonnet B confirmed fires correctly in practice) already handles mixed-separator role-branched templates. Not acted on.
1/11 Low Sonnet C: \{#.*?#\} regex would match {# ... #} inside a Jinja string literal. E.g. {% set x = '{# fake #}' %} would have its content scrubbed. No real tokenizer stores a literal {# inside a string; Sonnet C marked this as unreachable theoretical. Not acted on.
1/11 Info Hosted Gemini: preserve after_endfor by appending it after the generation block. Applied partially: we use the scrubbed after_endfor for the emptiness check (which is what mattered for the 7/7 finding), but we drop the (whitespace-and-comments-only) trailing content from the output so that apply_chat_template(add_generation_prompt=True) does not end with stray trailing whitespace. This matches the semantics of the adjacent first-branch repair path.
1/7 Contradicted reviewer.py run 1 flagged a syntax regression in generation_block string construction, claiming it produced {{% ... %}} / doubled delimiters and triggered TemplateSyntaxError. Contradicted by the other 6/7 reviewers and by both Sonnet B's real-tokenizer integration pass and Sonnet C's adversarial parse test, which all confirmed the generated Jinja compiles and renders cleanly. Not acted on.

Disagreements

  • reviewer.py run 1 claimed the generated Jinja is malformed; the other 6 runs plus Sonnet B (real transformers.AutoTokenizer.apply_chat_template) and Sonnet C (direct jinja2.Environment().from_string(...) compile check) all showed clean parse/render. We trust the independent parse evidence over a single-reviewer report.
  • Sonnet A, B, and C all voted MERGE even with the trailing-comment edge case present. The reviewer.py pool was unanimous that the trailing-comment case should be fixed. The fix is small (one regex), so we are applying it in this loop rather than deferring.

Fix to be committed

  • Rewrite the elif guard from after_endfor.strip() == "" to
    re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == "", so
    a template ending with {% endfor %}{# trailing comment #} enters the new
    repair path rather than falling through to RuntimeError.
  • Keep the output chat_template = chat_template[: where + len(chosen_end)] + generation_block
    (drop trailing whitespace/comments) so that apply_chat_template(add_generation_prompt=True) does not
    emit stray trailing whitespace after the generation prefix. This matches the adjacent first-branch behavior.

Verified against the full 20-case test matrix (10 original + 5 Bugs A/B/C + 5 new trailing-comment cases), all passing.

See inline comments for per-finding detail.

Comment thread unsloth/tokenizer_utils.py Outdated

chat_template = chat_template[: where + len(chosen_end)] + after_endfor

elif after_endfor.strip() == "":
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[8/11 reviewers] High - The emptiness check after_endfor.strip() == "" runs on the raw after_endfor string, so a template ending with {% endfor %}{# trailing comment #} has after_endfor = "{# trailing comment #}", which is non-empty, and the new repair branch never fires. All 7 reviewer.py --slow workers constructed this exact reproducer independently, and hosted Gemini reached the same issue from a different angle. Fix: scrub Jinja comments from after_endfor before the emptiness check so comment-only suffixes enter the new branch.

Suggested change
elif after_endfor.strip() == "":
elif (
re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == ""
):

"{%" + dash + " endif %}"
)
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Info - Hosted Gemini asked to preserve after_endfor by appending it to the generation block in the output. We are accepting the scrub-for-detection half of this suggestion but rejecting the preserve-in-output half: at this point after_endfor contains only whitespace and/or Jinja comments (verified by the scrubbed emptiness check above), and appending it would cause apply_chat_template(add_generation_prompt=True) to emit trailing whitespace after the generation prefix (e.g. ...<|im_start|>assistant\n \n\n ), which the model would see as part of the prompt. The first repair branch also does not preserve trailing whitespace, so this keeps the two branches symmetric. Keeping the output as-is.

# '<|im_sep|>' for user/assistant), prefer '<|im_sep|>' when
# it appears in the template, otherwise fall back to the
# ChatML newline default.
assistant_match = re.search(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Low - Sonnet A noted that assistant_match could match a '<|im_start|>assistant<sep>' literal stored in an unused {% set _helper = '...' %} block with a separator different from the actual assistant role branch. This is only reachable on synthetic templates that define a helper variable for documentation purposes; no known real tokenizer does this. The role_seps fallback path (which Sonnet B's real-tokenizer integration run confirmed fires correctly for mixed-separator role-branched templates) already handles the realistic cases. Not acted on in this round.

# by ChatML tokens or `add_generation_prompt` mentions that appear
# only inside a comment.
scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL)
if (
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Low - Sonnet C noted that the Jinja comment scrub regex \{#.*?#\} would also match {# ... #} sequences that appear inside a Jinja string literal (e.g. {% set x = '{# fake #}' %}), silently stripping the string's content from the scrubbed copy. No real tokenizer template stores a literal {# inside a string, and Sonnet C explicitly classified this as unreachable theoretical. Not acted on.

Loop-3 review round (7/7 reviewer.py --slow workers + hosted Gemini)
independently surfaced a trailing-Jinja-comment case that bypassed the
new empty-suffix repair branch from PR unslothai#4426: templates ending like
`{% endfor %}{# trailing comment #}` have `after_endfor = "{# ... #}"`
which is non-empty, so the emptiness check fails and the template
falls through to the RuntimeError.

Fix: scrub Jinja `{# ... #}` comments from `after_endfor` before the
emptiness check, so comment-only suffixes enter the new repair path.

The body of the branch keeps its existing behavior of appending just
the generation block (without re-appending the scrubbed `after_endfor`)
so that `apply_chat_template(add_generation_prompt=True)` does not
emit stray trailing whitespace after the generation prefix. This
matches the semantics of the adjacent first-branch repair path.

Verified against 20 test cases: 10 original template shapes, 5 Bug
A/B/C reproducers from the loop-2 round, and 5 new trailing-comment
reproducers. All pass.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the _fix_chat_template function to support ChatML-style templates that end with a loop and lack an add_generation_prompt block. The implementation includes logic to infer assistant separators from the existing template structure. A review comment identifies an inconsistency where trailing whitespace and Jinja comments are discarded, suggesting they be preserved to match the behavior of other repair branches.

Comment on lines +732 to +739
# `after_endfor` contains only whitespace and/or Jinja comments
# at this point (verified above), so it carries no semantic
# content for rendering. Drop it and append the generation
# block directly after `endfor`, matching the behavior of the
# first repair branch.
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation discards after_endfor, which may contain trailing whitespace or Jinja comments (e.g., {# end of template #}). While these are not semantically functional, discarding them is inconsistent with the first repair branch (lines 669-673), which preserves after_endfor by wrapping it. To maintain the original template's structure and preserve user comments, consider appending after_endfor after the generation_block. Note that the comment on line 735 is currently misleading as it claims to match the behavior of the first branch while doing the opposite.

            # `after_endfor` contains only whitespace and/or Jinja comments
            # at this point (verified above). Append the generation block
            # and preserve the trailing content for consistency.
            chat_template = (
                chat_template[: where + len(chosen_end)] + generation_block + after_endfor
            )

Copy link
Copy Markdown
Member Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop 4 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini).

Verdict breakdown:

  • reviewer.py: 5/7 APPROVE, 2/7 REQUEST_CHANGES
  • Sonnet A (final verification): MERGE, all 20 tests pass
  • Sonnet B (real transformers integration, 5 tokenizers): MERGE, 19/19 real-tokenizer tests pass including a stock unsloth/phi-4 backward-compat check
  • Sonnet C (adversarial regex hunt): MERGE with an optional 1-line defensive guard
  • Hosted Gemini: 1 Medium comment (duplicate of a loop-3 suggestion we already considered)
Reviewers Severity Finding
2/11 Low **Empty separator from bare `'<
1/11 Info Hosted Gemini: preserve after_endfor by appending it to the generation block. This is a re-raise of the loop-3 Gemini suggestion. We previously considered and rejected the preserve-in-output half because after_endfor at this point is whitespace-and-comments-only and appending it would cause apply_chat_template(add_generation_prompt=True) to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing code comment was misleading is valid (the first branch does preserve its after_endfor, but the content is semantic rather than whitespace). We are updating the comment to explain the asymmetry accurately without changing behavior.
1/7 Contradicted reviewer.py run 2 once again claimed generation_block emits invalid Jinja {{% ... %}}. Contradicted by 6/7 runs + Sonnet A + Sonnet B (real transformers.apply_chat_template end-to-end) which all show the block parses and renders cleanly. Not acted on; the source string is literally "{%" + dash + " if ...", which concatenates to {% if ..., not {{% if ....
1/7 Contradicted reviewer.py run 2 also claimed the comment scrub uses the wrong form ({{# ... #}} vs {# ... #}). Contradicted -- the source is r"\{#.*?#\}" which is the correct single-brace Jinja comment form. Verified by running the existing verify_trailing_comment.py test which explicitly constructs {# ... #} templates and confirms they are repaired.

Disagreements

  • Reviewer 2 has been consistently outvoted across loops on malformed-Jinja claims that are contradicted by every test that actually invokes Jinja2. We trust the Jinja2 parser over the single reviewer's static reading.
  • Hosted Gemini wants after_endfor preserved; the Sonnet subagents and the other reviewers show that preserving it produces semantically wrong output for whitespace-only suffixes. We are updating the explanatory comment rather than changing behavior.

Fix to be committed

  • Guard: if assistant_match captured an empty separator, fall through to the role-concat / <|im_sep|> / \n cascade rather than emitting a bare <|im_start|>assistant. One-line change: if assistant_match is not None and assistant_match.group(2):.
  • Reword the explanatory comment above chat_template = ... + generation_block to accurately describe why after_endfor is dropped in this branch while being wrapped in the first branch.

Verified against the full 23-case test matrix:

  • 10 original template shapes
  • 5 Bug A/B/C reproducers
  • 5 trailing-comment reproducers
  • 3 new empty-separator / bare-assistant-literal reproducers

All pass. Loop 4 complete. Ready to merge back to upstream.

See inline comments for per-finding detail.

for m in re.finditer(
r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
scrubbed,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/11 reviewers] Low - Empty separator from bare '<|im_start|>assistant' literal. When a template splits the assistant prefix across concatenated string literals ({{ '<|im_start|>assistant' }}{{ '\n' }} or {{ '<|im_start|>assistant' + '\n' + ... }}), the regex captures an empty group, and the injected generation block becomes {{ "<|im_start|>assistant" }} with no trailing newline -- add_generation_prompt=True then renders a bare <|im_start|>assistant token that the model will not interpret correctly. No known real-world tokenizer uses this split-literal pattern, but the defensive guard is one line.

Suggested change
)
if assistant_match is not None and assistant_match.group(2):

Comment thread unsloth/tokenizer_utils.py Outdated
# `after_endfor` contains only whitespace and/or Jinja comments
# at this point (verified above), so it carries no semantic
# content for rendering. Drop it and append the generation
# block directly after `endfor`, matching the behavior of the
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/11 reviewers] Info - Hosted Gemini re-raised the suggestion to append after_endfor to the output here to match the first repair branch. We previously considered this in loop 3 and rejected the output-preserve half because after_endfor is whitespace-and-comments-only at this point, and preserving it would cause apply_chat_template(add_generation_prompt=True) to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing comment was misleading is valid -- the first branch's after_endfor is semantic content (a {{ ... }} expression) rather than whitespace, so the two branches are intentionally asymmetric. Updating the comment to explain this without changing behavior.

assistant_prefix = "<|im_start|>assistant" + separator
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
'{{ "' + assistant_prefix.replace('"', '\\"') + '" }}'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/7, contradicted] Not acted on - reviewer.py run 2 claimed this line emits invalid Jinja {{% ... %}}. This is contradicted by 6/7 reviewer.py runs + Sonnet A + Sonnet B's real transformers.AutoTokenizer.apply_chat_template round-trip, all of which successfully parse and render the generated block. Static read of the source string "{%" + dash + " if add_generation_prompt %}" concatenates to {% if add_generation_prompt %}, not {{% ... %}}. The 6/7 consensus and live Jinja2 parser evidence is authoritative here.

Follow-up to PR unslothai#4426: reviewer.py run 6 and Sonnet C (adversarial
regex hunt) independently found that the `assistant_match` regex can
capture an empty separator when a template splits the assistant prefix
across concatenated string literals, e.g.
`{{ '<|im_start|>assistant' }}{{ '\n' }}...`. The injected generation
block then becomes `{{ "<|im_start|>assistant" }}` with no trailing
newline and the generation prompt is malformed.

Fix: only accept a non-empty captured separator; otherwise fall through
to the role-concat / `<|im_sep|>` / `\n` cascade so the fallback path
produces a usable ChatML generation prefix in every case. One line.

Also rewrite the explanatory comment above the generation-block append
to accurately describe why `after_endfor` is dropped in this branch
while being wrapped in the first branch. The first branch's
`after_endfor` is a semantic `{{ ... }}` expression, whereas this
branch's `after_endfor` is verified to contain only whitespace and
comments that would render as stray trailing output.

Verified against the full 23-case test matrix:
- 10 original template shapes
- 5 Bug A/B/C reproducers
- 5 trailing Jinja comment reproducers
- 3 new empty-separator / bare-assistant-literal reproducers
All pass.
@danielhanchen
Copy link
Copy Markdown
Member Author

Fixes pushed to original PR unslothai#4426. Closing staging copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Hermes Lora chat test bug: does not have a {% if add_generation_prompt %} for generation purposes

2 participants