fix(tokenizer): add add_generation_prompt (fixes #4150)#4298
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where chat templates with leading or trailing whitespace around the final generation prompt were not being correctly patched. The change to strip the string before checking conditions and then preserving the original whitespace is a good fix. I've added one suggestion to improve the efficiency of how the prefix and suffix whitespace is extracted.
| prefix = after_endfor[: len(after_endfor) - len(after_endfor.lstrip())] | ||
| suffix = after_endfor[len(after_endfor.rstrip()) :] | ||
| inner = after_stripped |
There was a problem hiding this comment.
The current implementation uses lstrip() and rstrip() to extract the prefix and suffix whitespace. This can be made more efficient by using str.find() on the already-stripped string (after_stripped) to determine the boundaries of the content, and then slicing to get the prefix and suffix. This avoids creating extra temporary strings from lstrip() and rstrip().
prefix_len = after_endfor.find(after_stripped)
prefix = after_endfor[:prefix_len]
suffix = after_endfor[prefix_len + len(after_stripped):]
inner = after_stripped|
Hi @mmathew23 @danielhanchen, This PR addresses #4150: ChatML-style templates with leading/trailing whitespace after |
ad5472e to
9988f66
Compare
for more information, see https://pre-commit.ci
fix(tokenizer): add add_generation_prompt to templates with whitespace after endfor
Fixes #4150
What
In
_fix_chat_template, the substring after{% endfor %}or{% endif %}is now normalized by stripping leading/trailing whitespace for the condition check. When we insert the{% if add_generation_prompt %}...{% endif %}wrapper, we preserve the original prefix and suffix whitespace so the patched template string keeps the same formatting.Before: We required
after_endfor.startswith("{{"). Templates like{% endfor %}\n{{ '<|im_start|>assistant\n' }}failed the check and were not patched →RuntimeErrorwhen loading LoRA for chat.After: We use
after_endfor = after_endfor_raw.strip()for the condition and reassemble aschosen_end + prefix_ws + wrapped + suffix_ws, so ChatML-style templates with newlines or spaces after the loop are correctly patched.Why
When loading a LoRA adapter (e.g. Hermes fine-tuned with Llama Factory) from a local path, the saved tokenizer's chat template can have a single
{{ ... }}block after{% endfor %}with leading/trailing whitespace. The previous logic only patched when the content immediately after the loop started with{{, so these templates were left withoutadd_generation_promptandfix_chat_templateraised:Stripping for the check and preserving whitespace when rebuilding the template fixes this without changing behaviour for templates that already have no surrounding whitespace.
Impact
RuntimeErrorwhen the saved tokenizer template has newlines or spaces after{% endfor %}.