-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[docs] Add chat templates page to web docs #5581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
574b196
5159a07
1e83e03
a1b251c
8132731
8c08b11
4f7f07d
e0bb78a
de7dd67
6099d3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! I'd suggest a few modifications, flagging the rationale: Reading the original intro cold, I felt it opened mid-thought: it jumped straight into "they serve two purposes: identity comparison and training patches", which is framed from the implementer's side (ie, why these files exist internally) rather than the reader's (=user) side: "why do I, a TRL user, care?". I reworked it along this arc:
Collapsed "Original templates" into "Supported model families". The per-file stubs ("Original Qwen3 chat template.", ...) weren't user-facing: those originals exist for TRL's internal identity-comparison; a user would never use them directly -> collapsed to a one-line family list |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Chat Templates | ||
|
|
||
| TRL ships a small collection of Jinja2 chat templates under [`trl/chat_templates/`](https://github.com/huggingface/trl/tree/main/trl/chat_templates). They serve two purposes: | ||
|
|
||
| 1. **Identity comparison**: detecting which model is being used (by comparing `processing_class.chat_template` against known templates) to add the appropriate response schema ([`add_response_schema`]) or swap in a training template ([`get_training_chat_template`]). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. - ([`add_response_schema`])
+ (`add_response_schema`)now that it's not in the doc anymore |
||
| 2. **Training patches**: modified templates that fix training-specific issues (prefix-preservation for GRPO, `{% generation %}` markers for SFT assistant-only loss). | ||
|
|
||
| **Why prefix-preserving?** The GRPO tool call loop extracts tool response formatting tokens by comparing tokenizations with and without tool messages appended (`_get_tool_suffix_ids`). This requires the chat template to be *prefix-preserving*: appending messages must not change how earlier messages are rendered. | ||
|
|
||
| **Why generation-tagged?** SFT with `assistant_only_loss=True` requires the chat template to include `{% generation %}` / `{% endgeneration %}` markers around assistant output, so `return_assistant_tokens_mask=True` can produce correct masks. Most model templates don't include these markers natively. | ||
|
|
||
| ## Original templates | ||
|
|
||
| Used for identity comparison only. | ||
|
|
||
| ### `deepseekv3.jinja` | ||
|
|
||
| Original DeepSeek-V3 chat template. | ||
|
|
||
| ### `glm4moe.jinja` | ||
|
|
||
| Original GLM-4-MoE chat template. | ||
|
|
||
| ### `gptoss.jinja` | ||
|
|
||
| Original GPT-OSS chat template. | ||
|
|
||
| ### `llama3.jinja` | ||
|
|
||
| Original Llama 3 chat template. | ||
|
|
||
| ### `llama3_1.jinja` / `llama3_2.jinja` | ||
|
|
||
| Original Llama 3.1 / 3.2 chat templates. Both render tool calls as a single bare JSON object using the key `parameters` (instead of `arguments`) and support at most one tool call per assistant turn. | ||
|
|
||
| ### `qwen2_5.jinja` | ||
|
|
||
| Original Qwen2.5 chat template. | ||
|
|
||
| ### `qwen3.jinja` | ||
|
|
||
| Original Qwen3 chat template. | ||
|
|
||
| ### `qwen3_vl.jinja` | ||
|
|
||
| Original Qwen3-VL chat template. Unlike text-only Qwen3, this template is already prefix-preserving (no conditional thinking blocks), so no training patch is needed. | ||
|
|
||
| ### `qwen3_5_2b_and_below.jinja` / `qwen3_5_4b_and_above.jinja` | ||
|
|
||
| Original Qwen3.5 chat templates. | ||
|
sergiopaniego marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Training templates | ||
|
|
||
| Patched templates that fix training-specific issues. Swapped in at init when tools are enabled (GRPO) or when `assistant_only_loss=True` (SFT). | ||
|
|
||
| ### `deepseekv3_training.jinja` | ||
|
|
||
| Patched DeepSeek-V3 template. Diff vs `deepseekv3.jinja`: | ||
|
|
||
| - Uses `| tojson` on `tool['function']['arguments']` so that `arguments` can be passed as a `dict` (the documented format per [transformers docs](https://huggingface.co/docs/transformers/en/chat_extras#tool-calling-example)). The original template uses raw string concatenation, which crashes on dict inputs. | ||
| - Wraps assistant message output with `{% generation %}` / `{% endgeneration %}` markers for SFT assistant-only loss. | ||
|
|
||
| ### `qwen3_training.jinja` | ||
|
|
||
| Patched Qwen3 template. Diff vs `qwen3.jinja`: | ||
|
|
||
| Require both `<think>` and `</think>` to be present before parsing, to avoid incorrect splitting when the model generates only one tag: | ||
|
|
||
| ```diff | ||
| - {%- if '</think>' in content %} | ||
| + {%- if '<think>' in content and '</think>' in content %} | ||
| ``` | ||
|
|
||
| Always include the thinking block regardless of message position. The original conditionally omits it based on `loop.last`, which changes the assistant rendering when a tool message is appended, breaking prefix-preservation: | ||
|
|
||
| ```diff | ||
| - {%- if loop.index0 > ns.last_query_index %} | ||
| - {%- if loop.last or (not loop.last and reasoning_content) %} | ||
| - {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }} | ||
| - {%- else %} | ||
| - {{- '<|im_start|>' + message.role + '\n' + content }} | ||
| - {%- endif %} | ||
| - {%- else %} | ||
| - {{- '<|im_start|>' + message.role + '\n' + content }} | ||
| - {%- endif %} | ||
| + {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }} | ||
| ``` | ||
|
|
||
| Wrap assistant message output with `{% generation %}` / `{% endgeneration %}` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss. | ||
|
|
||
| ### `gptoss_training.jinja` | ||
|
|
||
| Patched GPT-OSS template. Diff vs `gptoss.jinja`: | ||
|
|
||
| Wrap assistant message output with `{% generation %}` / `{% endgeneration %}` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss. | ||
|
|
||
| ### `llama3_training.jinja` | ||
|
|
||
| Patched Llama 3 template. Diff vs `llama3.jinja`: | ||
|
|
||
| Wrap assistant message output with `{% generation %}` / `{% endgeneration %}` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss. | ||
|
|
||
| ### `qwen2_5_training.jinja` | ||
|
|
||
| Patched Qwen2.5 template. Diff vs `qwen2_5.jinja`: | ||
|
|
||
| Wrap assistant message output with `{% generation %}` / `{% endgeneration %}` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss. | ||
|
|
||
| ## Related utilities | ||
|
|
||
| See [Chat Template Utilities](chat_template_utils) for the helper functions (`clone_chat_template`, `is_chat_template_prefix_preserving`, `get_training_chat_template`) that operate on these templates. | ||
|
sergiopaniego marked this conversation as resolved.
Outdated
|
||
Uh oh!
There was an error while loading. Please reload this page.