BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#38
BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#38danielhanchen wants to merge 7 commits into
Conversation
…on_prompt (unslothai#4150) ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA training may end with {% endfor %} and have no {% if add_generation_prompt %} block. The existing _fix_chat_template only handled the case where content ({{ ... }}) followed endfor, not the case where the template simply ended. Add an elif branch that detects ChatML templates (containing <|im_start|>) with an empty suffix after endfor, and appends the standard ChatML generation prompt block. Fixes unslothai#4150
…t prefix Address review feedback: Phi-4 uses <|im_sep|> instead of newline after role. Extract the separator from the template pattern (message['role'] + 'SEPARATOR') to generate the correct assistant prefix for each ChatML variant.
for more information, see https://pre-commit.ci
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the _fix_chat_template function in unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. The implementation attempts to dynamically extract model-specific separators using regex. Review feedback identifies a redundant import re statement and points out that the current regex is too restrictive, failing to handle templates with double quotes, dot notation, or hardcoded role strings like those found in Phi-4.
| import re | ||
|
|
||
| sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template) | ||
| separator = sep_match.group(1) if sep_match else "\\n" |
There was a problem hiding this comment.
There are two issues in this block:
- Redundant Import:
import reis unnecessary here asreis already imported at the top of the file (line 18). - Regex Robustness: The current regex
message\['role'\]\s*\+\s*'([^']*)'is too restrictive. It fails to handle templates using double quotes (message["role"]), dot notation (message.role), or hardcoded role strings (common in ChatML templates like Phi-4, which the PR specifically aims to support).
For example, in phi4_template, the role is hardcoded as {{'<|im_start|>assistant<|im_sep|>' ... }}. The current regex will return None, causing the separator to default to \n, which is incorrect for Phi-4 (should be <|im_sep|>).
# Extract the model-specific separator after role (e.g. '\n' for
# Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding.
sep_match = re.search(r"<\|im_start\|>(?:assistant|user|system)([^'\"{}|]*)", chat_template)
if not sep_match:
sep_match = re.search(r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*['\"]([^'\"]*)['\"]", chat_template)
separator = sep_match.group(1) if sep_match else "\\n"
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to fix a RuntimeError in _fix_chat_template() when ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training end at {% endfor %} with no {% if add_generation_prompt %} block (closes GH#4150). As a summary, this PR adds a new elif branch in _fix_chat_template() that detects empty-suffix templates containing <|im_start|>, extracts the role separator via regex, and appends a synthetic {% if add_generation_prompt %}{{ '<|im_start|>assistant<SEP>' }}{% endif %} block.
We ran 11 parallel reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Below is the consolidated table, sorted by consensus. The core repair path is sound and fixes the Hermes case, but the separator inference is too narrow and the regex scope is too restrictive.
| Reviewers | Severity | Finding |
|---|---|---|
| 10/11 | High | Separator inference only matches message['role'] + '...' (single-quoted, + concatenation). Hardcoded-role ChatML families (e.g. Phi-4 style <|im_sep|>) fall back to \n, so the fix silently produces <|im_start|>assistant\n instead of the correct <|im_start|>assistant<|im_sep|>. The PR description explicitly names Phi-4 as a target, so this is a regression from a loud RuntimeError to a silently malformed prompt. |
| 5/11 | Medium | Regex only matches single-quoted templates (message['role']); templates using message["role"] or message.role silently fall back to \n, losing correct separators for double-quoted / dot-access variants. |
| 3/11 | Low | import re inside _fix_chat_template is redundant -- re is already imported at module top (line 18). Harmless but inconsistent with the rest of the file. |
| 2/11 | Low | The elif guard "<|im_start|>" in chat_template is overly broad and could false-positive on templates that only mention the token in a Jinja {# comment #} or a string comparison. Tightening to additionally require "<|im_end|>" in chat_template or an output-context match would be safer. |
| 1/11 | Low | Generated Jinja block uses single-quoted string literal '<|im_start|>assistant' + separator + '; if an adversarial template ever contained a single quote in the separator, the injected Jinja would fail to parse at render time. |
Disagreements
- Sonnet subagent 1 called the PR "safe to merge as-is" because no real-world model on HF currently ships a broken ChatML template without
add_generation_prompt. Sonnet subagent 2 confirmed that fetching the actualNousResearch/Hermes-3-Llama-3.1-8B,unsloth/phi-4, andQwen/Qwen2.5-7B-Instructtokenizers shows they all already have the generation prompt block, so the newelifbranch never fires for the stock tokenizers. However, the other 9 reviewers correctly note that the whole point of the PR is to handle saved-after-LoRA-training templates where the block can be missing, and the Phi-4 regression path is concrete: issue unslothai#4150 itself arose from a saved Hermes LoRA, not a stock tokenizer. - One reviewer proposed hard-failing instead of patching when the separator cannot be confidently inferred, to preserve the previous explicit
RuntimeError. The majority instead prefer extending the regex so Phi-4 etc. are also repaired correctly.
Concrete fix (derived from reviewer consensus)
Replace the elif body with a stronger extractor that checks the assistant literal first, then role-concatenation with both quote styles and .role attribute access, and only then falls back to \n:
elif after_endfor.strip() == "" and "<|im_start|>" in chat_template:
# GH#4150: ChatML-style templates (Hermes, Phi-4, Magnum, etc.) that
# end at {% endfor %} with no add_generation_prompt block. Infer the
# model-specific separator after "assistant" from the template itself.
# Prefer an explicit assistant literal ("<|im_start|>assistant<sep>"),
# then fall back to `message['role'] + '<sep>'` in either quote style,
# then to the common ChatML newline.
assistant_match = (
re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template)
or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template)
)
role_match = re.search(
r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1",
chat_template,
)
if assistant_match is not None:
separator = assistant_match.group(1)
elif role_match is not None:
separator = role_match.group(2)
else:
separator = "\\n"
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
"{{ '<|im_start|>assistant" + separator + "' }}"
"{%" + dash + " endif %}"
)
chat_template = chat_template[: where + len(chosen_end)] + generation_blockAlso drop the in-function import re (module-level re import already exists at line 18).
See inline comments for per-finding detail and suggestion blocks.
| import re | ||
|
|
||
| sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template) | ||
| separator = sep_match.group(1) if sep_match else "\\n" |
There was a problem hiding this comment.
[10/11 reviewers] High - The separator inference is too narrow and introduces a silent regression for Phi-4 / <|im_sep|> style ChatML templates. The current regex message\['role'\]\s*\+\s*'([^']*)' only matches the + concat + single-quote shape used by Hermes/Magnum. For templates that hardcode the role (Phi-4 uses literal '<|im_start|>assistant<|im_sep|>'), sep_match returns None, the separator defaults to \n, and the fix produces <|im_start|>assistant\n instead of <|im_start|>assistant<|im_sep|>. This converts a loud RuntimeError into a silently malformed prompt, which is a regression from the pre-PR state. The PR description explicitly names Phi-4 as a target, so this case needs to be handled. Suggested fix: prefer an explicit assistant literal, then role-concat with both quote styles and .role attribute, then fall back to \n.
| separator = sep_match.group(1) if sep_match else "\\n" | |
| # Prefer the explicit assistant literal in the template, then | |
| # role-concatenation in either quote style, then fall back to | |
| # the common ChatML newline separator. | |
| assistant_match = ( | |
| re.search(r"'<\|im_start\|>assistant([^']*)'", chat_template) | |
| or re.search(r'"<\|im_start\|>assistant([^"]*)"', chat_template) | |
| ) | |
| role_match = re.search( | |
| r"message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1", | |
| chat_template, | |
| ) | |
| if assistant_match is not None: | |
| separator = assistant_match.group(1) | |
| elif role_match is not None: | |
| separator = role_match.group(2) | |
| else: | |
| separator = "\\n" |
| # Extract the model-specific separator after role (e.g. '\n' for | ||
| # Hermes, '<|im_sep|>' for Phi-4) to avoid hard-coding. | ||
| import re | ||
|
|
There was a problem hiding this comment.
[3/11 reviewers] Low - import re inside the function body is redundant because re is already imported at module top (line 18 of this file). It does not cause a functional bug thanks to Python's module cache, but it is inconsistent with the rest of the file and adds noise. Please remove it.
|
|
||
| chat_template = chat_template[: where + len(chosen_end)] + after_endfor | ||
|
|
||
| elif after_endfor.strip() == "" and "<|im_start|>" in chat_template: |
There was a problem hiding this comment.
[2/11 reviewers] Low - The guard "<|im_start|>" in chat_template is overly broad: a template that merely references <|im_start|> inside a Jinja {# comment #} or in an equality check ({% if token == '<|im_start|>' %}) without actually emitting ChatML would also trigger this branch. A safer condition is to require both tokens and preferably that they appear in output {{ ... }} contexts.
| elif after_endfor.strip() == "" and "<|im_start|>" in chat_template: | |
| elif ( | |
| after_endfor.strip() == "" | |
| and "<|im_start|>" in chat_template | |
| and "<|im_end|>" in chat_template | |
| ): |
| assistant_prefix = "<|im_start|>assistant" + separator | ||
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| "{{ '" + assistant_prefix + "' }}" |
There was a problem hiding this comment.
[1/11 reviewers] Low - The generated Jinja block wraps the assistant prefix in a single-quoted Jinja string literal. If the extracted separator ever contained a single quote, the injected block would produce invalid Jinja ({{ '<|im_start|>assistant'bad' }}), failing at inference time. None of the known ChatML separators (\n, <|im_sep|>, <|end_header_id|>) contain single quotes, but a minimal hardening is cheap: escape any literal single quotes in separator before interpolation, or use double quotes for the Jinja literal. This is a low-priority defensive hardening.
Broaden the empty-suffix ChatML repair branch so it correctly handles hardcoded-role templates (Phi-4 style) and double-quoted / dot-access variants, rather than silently falling back to '\n' for everything that is not `message['role'] + '...'`. - Prefer an explicit assistant literal (`'<|im_start|>assistant<sep>'`) so Phi-4 templates pick up `<|im_sep|>` instead of `\n`. - Accept both single- and double-quoted role concatenation, plus `message.role` attribute access. - Guard the branch on `<|im_end|> in chat_template` and `add_generation_prompt not in chat_template` to avoid false positives on templates that merely mention `<|im_start|>` in a comment, and to make the function idempotent if called on an already-fixed template. - Drop the redundant in-function `import re` (module-level import at line 18 already exists). - Emit the generated Jinja block with a double-quoted string literal so a separator containing a single quote cannot break the injected block.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates unsloth/tokenizer_utils.py to automatically append an add_generation_prompt block to ChatML-style templates when it is missing. The implementation uses regex to infer the assistant separator from the existing template structure. Review feedback suggests using a literal newline character for the fallback separator to ensure correct Jinja rendering and recommends refining the role-matching regex to specifically target the assistant role for better precision.
| elif role_match is not None: | ||
| separator = role_match.group(2) | ||
| else: | ||
| separator = "\\n" |
There was a problem hiding this comment.
The fallback separator is set to a literal string \\n (backslash followed by 'n'). While Jinja2 string literals do interpret \n as a newline character, it might be more robust and consistent with the regex matches (which likely return actual newline characters) to use a literal newline character \n as the fallback, or ensure that the constructed Jinja block handles the separator correctly in all edge cases. If the intention is to provide a newline, using "\n" is generally preferred for clarity in Python code constructing Jinja templates.
| separator = "\\n" | |
| separator = "\n" |
| chat_template, | ||
| ) | ||
| role_match = re.search( | ||
| r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""", |
There was a problem hiding this comment.
The regex for role_match is quite broad as it matches any role concatenation (e.g., user, system, or assistant). While ChatML templates typically use the same separator for all roles, there is a small risk of inferring the wrong separator if the template uses different formatting for different roles. Consider refining the regex to specifically look for the assistant role if possible, although the current heuristic is likely sufficient for most ChatML-style templates.
danielhanchen
left a comment
There was a problem hiding this comment.
Loop 2 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini). Note: the local gemini_parallel_review.py (1d) still reports an empty review body due to a workspace path mismatch during its internal tool invocation, so it does not contribute findings this round.
Verdict breakdown: 7/7 reviewer.py APPROVE, Sonnet A MERGE, Sonnet B BLOCK (3 new bugs), Sonnet C MERGE, Gemini hosted 2 Medium comments (one duplicating Sonnet B's role-regex concern). The separator-inference robustness improvements are effective (Phi-4, .role, double-quoted all now work), but one substantive new issue emerged from Sonnet B that was independently reinforced by the hosted Gemini review.
| Reviewers | Severity | Finding |
|---|---|---|
| 2/11 | High | Bug B: role_match picks first occurrence, not the assistant-role separator. When a template branches on role (e.g. system uses \n but user/assistant use <|im_sep|>), re.search returns the system separator and the generated block produces <|im_start|>assistant\n instead of <|im_start|>assistant<|im_sep|>. Reproduced concretely with a Phi-4-mini-style if/else template that is now silently mispatched. |
| 1/11 | Medium | Bug A: assistant_match regex can match inside a {# Jinja comment #} block. A template that contains a documentation-style comment like {# example: '<|im_start|>assistant<WRONG>' #} will have WRONG extracted as the separator. Adversarial/unlikely but trivially fixable by stripping comments before regex. |
| 1/11 | Medium | Bug C: "add_generation_prompt" not in chat_template guard fires on comments. A template with {# ignores add_generation_prompt #} passes the substring check and the elif is skipped, leaving the template unrepaired. Also adversarial, also trivially fixable by scrubbing comments first. |
| 1/11 | Low | Fallback separator = "\\n" is a two-character Python string \n. Hosted Gemini asked whether this should be "\n" (real newline) for clarity. This is actually correct as-is: the separator is interpolated into a double-quoted Jinja literal, so "\\n" becomes \n in the template source, which Jinja2 then renders as an actual newline. Using a literal newline would change the generated template source but produce the same runtime output. Keeping as-is for source-readability. |
| 0/11 | -- | 7/7 reviewer.py --slow workers, Sonnet A, and Sonnet C independently verified no regressions on pre-existing paths, on {%- endfor %} dashed variants, on message.role attribute access, on double-quoted templates, and on 5 real transformers tokenizers (Hermes-3, Phi-4, Qwen2.5, Magnum-v2, Llama-3.1) which all already ship add_generation_prompt blocks and therefore never reach the new branch. |
Disagreements
- Sonnet A and Sonnet C rated the Jinja-comment edge cases as "purely theoretical, no known real tokenizer triggers them, safe to merge." Sonnet B rated the role-branch regression as "High / BLOCK." The hosted Gemini independently flagged the same
role_matchregex as too broad, which gives Bug B multiple-source support even though its reviewer count is 2/11. The fix for all three bugs is small and low-risk, so we are applying it in the same loop rather than deferring.
Fix to be committed
- Strip Jinja
{# ... #}comments before all regex / substring checks (fixes Bugs A and C). - Collect every
role + sepmatch withre.finditer; if they all agree, use that separator; if they disagree, prefer<|im_sep|>when it appears in the template, otherwise fall back to the ChatML\ndefault (fixes Bug B). - Keep the existing structure: explicit
<|im_start|>assistant<sep>literal is still preferred when present.
See inline comments for per-finding detail and suggestion blocks.
| assistant_match = re.search( | ||
| r"""(['"])<\|im_start\|>assistant([^'"]*)\1""", | ||
| chat_template, | ||
| ) |
There was a problem hiding this comment.
[2/11 reviewers] High - Bug B: re.search returns only the first message['role'] + '<sep>' match in the template. For templates that branch on role (e.g. system uses \n but user/assistant use <|im_sep|>), this picks the system separator, so the injected block produces <|im_start|>assistant\n instead of the correct <|im_start|>assistant<|im_sep|>. Reproduced with a Phi-4-mini-style role-branched template: pre-fix rendered <|im_start|>assistant\n, post-fix rendered <|im_start|>assistant<|im_sep|> (correct). The hosted Gemini reviewer also flagged the same concern independently as 'role regex is too broad'.
| ) | |
| role_seps = [ | |
| m.group(2) | |
| for m in re.finditer( | |
| r"""message(?:\[['\"]role['\"]\]|\.role)\s*\+\s*(['\"])([^'\"]*)\1""", | |
| scrubbed, | |
| ) | |
| ] | |
| unique_role_seps = list(dict.fromkeys(role_seps)) | |
| if assistant_match is not None: | |
| separator = assistant_match.group(2) | |
| elif len(unique_role_seps) == 1: | |
| separator = unique_role_seps[0] | |
| elif "<|im_sep|>" in scrubbed: | |
| separator = "<|im_sep|>" | |
| else: | |
| separator = "\\n" |
| # Infer the model-specific separator after "assistant" from the | ||
| # template itself. Prefer an explicit assistant literal | ||
| # (`'<|im_start|>assistant<sep>'`), then role-concatenation in either | ||
| # quote style and with either `message['role']` or `message.role` |
There was a problem hiding this comment.
[1/11 reviewers] Medium - Bug A: the assistant_match regex runs over the raw template string, so a Jinja {# comment #} that happens to contain an example like {# example: '<|im_start|>assistant<WRONG>' #} will have WRONG extracted as the separator and injected into the generated block. Strip Jinja comments before regex / substring checks to close this (and Bug C below) cleanly.
|
|
||
| chat_template = chat_template[: where + len(chosen_end)] + after_endfor | ||
|
|
||
| elif ( |
There was a problem hiding this comment.
[1/11 reviewers] Medium - Bug C: the guard "add_generation_prompt" not in chat_template is a raw substring check. A template with a comment like {# This template ignores add_generation_prompt #} will match the substring and the elif will be skipped, leaving the template unrepaired even though the real Jinja block is absent. Fix by scrubbing comments first and checking against the scrubbed string.
| elif ( | |
| elif after_endfor.strip() == "": | |
| # Strip Jinja `{# ... #}` comments before any regex / | |
| # substring check so the guard and separator inference | |
| # cannot be fooled by tokens that appear only in comments. | |
| scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL) | |
| if ( | |
| "<|im_start|>" in scrubbed | |
| and "<|im_end|>" in scrubbed | |
| and "add_generation_prompt" not in scrubbed | |
| ): |
| role_match = re.search( | ||
| r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""", | ||
| chat_template, | ||
| ) |
There was a problem hiding this comment.
[1/11 reviewers] Low - Hosted Gemini suggested changing the fallback separator = "\\n" to separator = "\n". These are not equivalent here: the separator is interpolated into a Jinja string literal, so "\\n" becomes \n in the emitted template source, which Jinja renders as a real newline. Using a literal \n would embed a raw newline in the template source, which is uglier but functionally identical. No change needed; noting here for transparency.
…olution Follow-up to the earlier PR unslothai#4426 fix for ChatML empty-suffix templates. Second round of review surfaced three additional issues in the new elif branch; this commit addresses all three. - Bug B (High): re.search returned the first `message['role'] + '<sep>'` match only, so templates that branch per-role (e.g. Phi-4-mini style where `system` uses `\n` but `user`/`assistant` use `<|im_sep|>`) got the wrong separator and the injected block became `<|im_start|>assistant\n` instead of `<|im_start|>assistant<|im_sep|>`. Fix: scan every role concatenation with re.finditer, use the single separator if all agree, otherwise prefer `<|im_sep|>` when present in the template, otherwise fall back to `\n`. - Bug A (Medium): the assistant-literal regex could match inside a Jinja `{# comment #}` block, so a template containing documentation like `{# example: '<|im_start|>assistant<WRONG>' #}` would silently extract `WRONG` as the separator. - Bug C (Medium): the guard `"add_generation_prompt" not in chat_template` is a raw substring check, so a template with a comment like `{# This template ignores add_generation_prompt #}` would match the substring and the elif would be skipped, leaving the template unrepaired. Fix for Bugs A and C: strip Jinja `{# ... #}` comments once before any regex or substring check in this branch. Verified against: - 10-case template simulation (Hermes, Phi-4, double-quoted, dot-access, dashed, trailing-expr, Llama-3 control, already-fixed, pre-PR compare) - 5 new bug reproducers (Bug A, Bug B, Bug C, idempotency, Hermes basic) All 15 cases pass.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates unsloth/tokenizer_utils.py to support ChatML-style templates that lack an add_generation_prompt block. It introduces logic to infer the assistant separator from the template while ignoring Jinja comments. A review comment suggests appending the original trailing whitespace when inserting the new generation block to maintain consistent formatting.
| chat_template = ( | ||
| chat_template[: where + len(chosen_end)] + generation_block | ||
| ) |
There was a problem hiding this comment.
When appending the generation_block, it is recommended to include the original after_endfor content (which is verified to be only whitespace in this branch). This preserves any trailing whitespace or newlines that were present at the end of the original template, maintaining consistent file formatting and avoiding unnecessary changes to the template's suffix.
| chat_template = ( | |
| chat_template[: where + len(chosen_end)] + generation_block | |
| ) | |
| chat_template = ( | |
| chat_template[: where + len(chosen_end)] + generation_block + after_endfor | |
| ) |
danielhanchen
left a comment
There was a problem hiding this comment.
Loop 3 review: 11 reviewers (7 reviewer.py --slow workers + 3 Sonnet subagents + hosted Gemini). Local gemini_parallel_review.py (1d) still returns an empty review body (tool path issue unrelated to this PR) and does not contribute.
Verdict breakdown:
reviewer.py: 6/7 REQUEST_CHANGES, 1/7 APPROVE- Sonnet A (final verification pass): MERGE
- Sonnet B (end-to-end
transformersintegration): MERGE (all 5 real-tokenizer tests pass) - Sonnet C (adversarial regex hunt): MERGE (only theoretical edge cases, not reachable)
- Hosted Gemini: 1 Medium comment requesting
after_endforpreservation
| Reviewers | Severity | Finding |
|---|---|---|
| 8/11 | High | Trailing Jinja comment after {% endfor %} still bypasses the new branch. All 7 reviewer.py --slow workers independently constructed and reproduced a template ending like {% endfor %}{# trailing comment #}: after_endfor.strip() == "" returns False because the comment is non-whitespace, so the elif is skipped and fix_chat_template still raises. Hosted Gemini reached the same issue from a different angle (asking for after_endfor preservation in the output). Fix: scrub Jinja comments from after_endfor before the emptiness check. |
| 1/11 | Low | Sonnet A: assistant_match can still match an assistant literal inside a {% set _helper = '...' %} block. Triggers only when a template stores an unused helper variable with a different separator than the actual assistant branch. No known tokenizer uses this pattern; the role_seps fallback path (which Sonnet B confirmed fires correctly in practice) already handles mixed-separator role-branched templates. Not acted on. |
| 1/11 | Low | Sonnet C: \{#.*?#\} regex would match {# ... #} inside a Jinja string literal. E.g. {% set x = '{# fake #}' %} would have its content scrubbed. No real tokenizer stores a literal {# inside a string; Sonnet C marked this as unreachable theoretical. Not acted on. |
| 1/11 | Info | Hosted Gemini: preserve after_endfor by appending it after the generation block. Applied partially: we use the scrubbed after_endfor for the emptiness check (which is what mattered for the 7/7 finding), but we drop the (whitespace-and-comments-only) trailing content from the output so that apply_chat_template(add_generation_prompt=True) does not end with stray trailing whitespace. This matches the semantics of the adjacent first-branch repair path. |
| 1/7 | Contradicted | reviewer.py run 1 flagged a syntax regression in generation_block string construction, claiming it produced {{% ... %}} / doubled delimiters and triggered TemplateSyntaxError. Contradicted by the other 6/7 reviewers and by both Sonnet B's real-tokenizer integration pass and Sonnet C's adversarial parse test, which all confirmed the generated Jinja compiles and renders cleanly. Not acted on. |
Disagreements
reviewer.pyrun 1 claimed the generated Jinja is malformed; the other 6 runs plus Sonnet B (realtransformers.AutoTokenizer.apply_chat_template) and Sonnet C (directjinja2.Environment().from_string(...)compile check) all showed clean parse/render. We trust the independent parse evidence over a single-reviewer report.- Sonnet A, B, and C all voted MERGE even with the trailing-comment edge case present. The
reviewer.pypool was unanimous that the trailing-comment case should be fixed. The fix is small (one regex), so we are applying it in this loop rather than deferring.
Fix to be committed
- Rewrite the elif guard from
after_endfor.strip() == ""to
re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == "", so
a template ending with{% endfor %}{# trailing comment #}enters the new
repair path rather than falling through toRuntimeError. - Keep the output
chat_template = chat_template[: where + len(chosen_end)] + generation_block
(drop trailing whitespace/comments) so thatapply_chat_template(add_generation_prompt=True)does not
emit stray trailing whitespace after the generation prefix. This matches the adjacent first-branch behavior.
Verified against the full 20-case test matrix (10 original + 5 Bugs A/B/C + 5 new trailing-comment cases), all passing.
See inline comments for per-finding detail.
|
|
||
| chat_template = chat_template[: where + len(chosen_end)] + after_endfor | ||
|
|
||
| elif after_endfor.strip() == "": |
There was a problem hiding this comment.
[8/11 reviewers] High - The emptiness check after_endfor.strip() == "" runs on the raw after_endfor string, so a template ending with {% endfor %}{# trailing comment #} has after_endfor = "{# trailing comment #}", which is non-empty, and the new repair branch never fires. All 7 reviewer.py --slow workers constructed this exact reproducer independently, and hosted Gemini reached the same issue from a different angle. Fix: scrub Jinja comments from after_endfor before the emptiness check so comment-only suffixes enter the new branch.
| elif after_endfor.strip() == "": | |
| elif ( | |
| re.sub(r"\{#.*?#\}", "", after_endfor, flags=re.DOTALL).strip() == "" | |
| ): |
| "{%" + dash + " endif %}" | ||
| ) | ||
| chat_template = ( | ||
| chat_template[: where + len(chosen_end)] + generation_block |
There was a problem hiding this comment.
[1/11 reviewers] Info - Hosted Gemini asked to preserve after_endfor by appending it to the generation block in the output. We are accepting the scrub-for-detection half of this suggestion but rejecting the preserve-in-output half: at this point after_endfor contains only whitespace and/or Jinja comments (verified by the scrubbed emptiness check above), and appending it would cause apply_chat_template(add_generation_prompt=True) to emit trailing whitespace after the generation prefix (e.g. ...<|im_start|>assistant\n \n\n ), which the model would see as part of the prompt. The first repair branch also does not preserve trailing whitespace, so this keeps the two branches symmetric. Keeping the output as-is.
| # '<|im_sep|>' for user/assistant), prefer '<|im_sep|>' when | ||
| # it appears in the template, otherwise fall back to the | ||
| # ChatML newline default. | ||
| assistant_match = re.search( |
There was a problem hiding this comment.
[1/11 reviewers] Low - Sonnet A noted that assistant_match could match a '<|im_start|>assistant<sep>' literal stored in an unused {% set _helper = '...' %} block with a separator different from the actual assistant role branch. This is only reachable on synthetic templates that define a helper variable for documentation purposes; no known real tokenizer does this. The role_seps fallback path (which Sonnet B's real-tokenizer integration run confirmed fires correctly for mixed-separator role-branched templates) already handles the realistic cases. Not acted on in this round.
| # by ChatML tokens or `add_generation_prompt` mentions that appear | ||
| # only inside a comment. | ||
| scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags=re.DOTALL) | ||
| if ( |
There was a problem hiding this comment.
[1/11 reviewers] Low - Sonnet C noted that the Jinja comment scrub regex \{#.*?#\} would also match {# ... #} sequences that appear inside a Jinja string literal (e.g. {% set x = '{# fake #}' %}), silently stripping the string's content from the scrubbed copy. No real tokenizer template stores a literal {# inside a string, and Sonnet C explicitly classified this as unreachable theoretical. Not acted on.
Loop-3 review round (7/7 reviewer.py --slow workers + hosted Gemini) independently surfaced a trailing-Jinja-comment case that bypassed the new empty-suffix repair branch from PR unslothai#4426: templates ending like `{% endfor %}{# trailing comment #}` have `after_endfor = "{# ... #}"` which is non-empty, so the emptiness check fails and the template falls through to the RuntimeError. Fix: scrub Jinja `{# ... #}` comments from `after_endfor` before the emptiness check, so comment-only suffixes enter the new repair path. The body of the branch keeps its existing behavior of appending just the generation block (without re-appending the scrubbed `after_endfor`) so that `apply_chat_template(add_generation_prompt=True)` does not emit stray trailing whitespace after the generation prefix. This matches the semantics of the adjacent first-branch repair path. Verified against 20 test cases: 10 original template shapes, 5 Bug A/B/C reproducers from the loop-2 round, and 5 new trailing-comment reproducers. All pass.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the _fix_chat_template function to support ChatML-style templates that end with a loop and lack an add_generation_prompt block. The implementation includes logic to infer assistant separators from the existing template structure. A review comment identifies an inconsistency where trailing whitespace and Jinja comments are discarded, suggesting they be preserved to match the behavior of other repair branches.
| # `after_endfor` contains only whitespace and/or Jinja comments | ||
| # at this point (verified above), so it carries no semantic | ||
| # content for rendering. Drop it and append the generation | ||
| # block directly after `endfor`, matching the behavior of the | ||
| # first repair branch. | ||
| chat_template = ( | ||
| chat_template[: where + len(chosen_end)] + generation_block | ||
| ) |
There was a problem hiding this comment.
The current implementation discards after_endfor, which may contain trailing whitespace or Jinja comments (e.g., {# end of template #}). While these are not semantically functional, discarding them is inconsistent with the first repair branch (lines 669-673), which preserves after_endfor by wrapping it. To maintain the original template's structure and preserve user comments, consider appending after_endfor after the generation_block. Note that the comment on line 735 is currently misleading as it claims to match the behavior of the first branch while doing the opposite.
# `after_endfor` contains only whitespace and/or Jinja comments
# at this point (verified above). Append the generation block
# and preserve the trailing content for consistency.
chat_template = (
chat_template[: where + len(chosen_end)] + generation_block + after_endfor
)
danielhanchen
left a comment
There was a problem hiding this comment.
Loop 4 review: 11 reviewers (7 reviewer.py --slow + 3 Sonnet subagents + hosted Gemini).
Verdict breakdown:
reviewer.py: 5/7 APPROVE, 2/7 REQUEST_CHANGES- Sonnet A (final verification): MERGE, all 20 tests pass
- Sonnet B (real
transformersintegration, 5 tokenizers): MERGE, 19/19 real-tokenizer tests pass including a stockunsloth/phi-4backward-compat check - Sonnet C (adversarial regex hunt): MERGE with an optional 1-line defensive guard
- Hosted Gemini: 1 Medium comment (duplicate of a loop-3 suggestion we already considered)
| Reviewers | Severity | Finding |
|---|---|---|
| 2/11 | Low | **Empty separator from bare `'< |
| 1/11 | Info | Hosted Gemini: preserve after_endfor by appending it to the generation block. This is a re-raise of the loop-3 Gemini suggestion. We previously considered and rejected the preserve-in-output half because after_endfor at this point is whitespace-and-comments-only and appending it would cause apply_chat_template(add_generation_prompt=True) to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing code comment was misleading is valid (the first branch does preserve its after_endfor, but the content is semantic rather than whitespace). We are updating the comment to explain the asymmetry accurately without changing behavior. |
| 1/7 | Contradicted | reviewer.py run 2 once again claimed generation_block emits invalid Jinja {{% ... %}}. Contradicted by 6/7 runs + Sonnet A + Sonnet B (real transformers.apply_chat_template end-to-end) which all show the block parses and renders cleanly. Not acted on; the source string is literally "{%" + dash + " if ...", which concatenates to {% if ..., not {{% if .... |
| 1/7 | Contradicted | reviewer.py run 2 also claimed the comment scrub uses the wrong form ({{# ... #}} vs {# ... #}). Contradicted -- the source is r"\{#.*?#\}" which is the correct single-brace Jinja comment form. Verified by running the existing verify_trailing_comment.py test which explicitly constructs {# ... #} templates and confirms they are repaired. |
Disagreements
- Reviewer 2 has been consistently outvoted across loops on malformed-Jinja claims that are contradicted by every test that actually invokes Jinja2. We trust the Jinja2 parser over the single reviewer's static reading.
- Hosted Gemini wants
after_endforpreserved; the Sonnet subagents and the other reviewers show that preserving it produces semantically wrong output for whitespace-only suffixes. We are updating the explanatory comment rather than changing behavior.
Fix to be committed
- Guard: if
assistant_matchcaptured an empty separator, fall through to the role-concat /<|im_sep|>/\ncascade rather than emitting a bare<|im_start|>assistant. One-line change:if assistant_match is not None and assistant_match.group(2):. - Reword the explanatory comment above
chat_template = ... + generation_blockto accurately describe whyafter_endforis dropped in this branch while being wrapped in the first branch.
Verified against the full 23-case test matrix:
- 10 original template shapes
- 5 Bug A/B/C reproducers
- 5 trailing-comment reproducers
- 3 new empty-separator / bare-assistant-literal reproducers
All pass. Loop 4 complete. Ready to merge back to upstream.
See inline comments for per-finding detail.
| for m in re.finditer( | ||
| r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""", | ||
| scrubbed, | ||
| ) |
There was a problem hiding this comment.
[2/11 reviewers] Low - Empty separator from bare '<|im_start|>assistant' literal. When a template splits the assistant prefix across concatenated string literals ({{ '<|im_start|>assistant' }}{{ '\n' }} or {{ '<|im_start|>assistant' + '\n' + ... }}), the regex captures an empty group, and the injected generation block becomes {{ "<|im_start|>assistant" }} with no trailing newline -- add_generation_prompt=True then renders a bare <|im_start|>assistant token that the model will not interpret correctly. No known real-world tokenizer uses this split-literal pattern, but the defensive guard is one line.
| ) | |
| if assistant_match is not None and assistant_match.group(2): |
| # `after_endfor` contains only whitespace and/or Jinja comments | ||
| # at this point (verified above), so it carries no semantic | ||
| # content for rendering. Drop it and append the generation | ||
| # block directly after `endfor`, matching the behavior of the |
There was a problem hiding this comment.
[1/11 reviewers] Info - Hosted Gemini re-raised the suggestion to append after_endfor to the output here to match the first repair branch. We previously considered this in loop 3 and rejected the output-preserve half because after_endfor is whitespace-and-comments-only at this point, and preserving it would cause apply_chat_template(add_generation_prompt=True) to emit stray trailing whitespace after the generation prefix. Gemini's secondary point that the existing comment was misleading is valid -- the first branch's after_endfor is semantic content (a {{ ... }} expression) rather than whitespace, so the two branches are intentionally asymmetric. Updating the comment to explain this without changing behavior.
| assistant_prefix = "<|im_start|>assistant" + separator | ||
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| '{{ "' + assistant_prefix.replace('"', '\\"') + '" }}' |
There was a problem hiding this comment.
[1/7, contradicted] Not acted on - reviewer.py run 2 claimed this line emits invalid Jinja {{% ... %}}. This is contradicted by 6/7 reviewer.py runs + Sonnet A + Sonnet B's real transformers.AutoTokenizer.apply_chat_template round-trip, all of which successfully parse and render the generated block. Static read of the source string "{%" + dash + " if add_generation_prompt %}" concatenates to {% if add_generation_prompt %}, not {{% ... %}}. The 6/7 consensus and live Jinja2 parser evidence is authoritative here.
Follow-up to PR unslothai#4426: reviewer.py run 6 and Sonnet C (adversarial regex hunt) independently found that the `assistant_match` regex can capture an empty separator when a template splits the assistant prefix across concatenated string literals, e.g. `{{ '<|im_start|>assistant' }}{{ '\n' }}...`. The injected generation block then becomes `{{ "<|im_start|>assistant" }}` with no trailing newline and the generation prompt is malformed. Fix: only accept a non-empty captured separator; otherwise fall through to the role-concat / `<|im_sep|>` / `\n` cascade so the fallback path produces a usable ChatML generation prefix in every case. One line. Also rewrite the explanatory comment above the generation-block append to accurately describe why `after_endfor` is dropped in this branch while being wrapped in the first branch. The first branch's `after_endfor` is a semantic `{{ ... }}` expression, whereas this branch's `after_endfor` is verified to contain only whitespace and comments that would render as stray trailing output. Verified against the full 23-case test matrix: - 10 original template shapes - 5 Bug A/B/C reproducers - 5 trailing Jinja comment reproducers - 3 new empty-separator / bare-assistant-literal reproducers All pass.
|
Fixes pushed to original PR unslothai#4426. Closing staging copy. |
Staging mirror of unslothai#4426
Original PR: unslothai#4426
Author: kimimgo
This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.
Original description
Fixes unslothai#4150
Problem
ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with
{% endfor %}and have no{% if add_generation_prompt %}block. This causesfix_chat_template()to raise:Root Cause
_fix_chat_template()only patches templates where content ({{ ... }}) follows{% endfor %}. When the template simply ends after{% endfor %}(empty suffix),the function returns the template unchanged, and the caller raises RuntimeError.
Fix
Add an
elifbranch in_fix_chat_template()that detects ChatML templates(containing
<|im_start|>) with an empty suffix afterendfor, and appendsthe standard ChatML generation prompt block:
This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.