Skip to content
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
title: Quickstart
title: Getting started
- sections:
- local: chat_templates
title: Chat Templates
- local: dataset_formats
title: Dataset Formats
- local: paper_index
Expand Down
6 changes: 6 additions & 0 deletions docs/source/chat_template_utils.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Chat template utilities

For an overview of the chat templates bundled with TRL and the rationale behind the training patches, see [Chat Templates](chat_templates).

## add_response_schema
Comment thread
sergiopaniego marked this conversation as resolved.
Outdated

[[autodoc]] add_response_schema

## clone_chat_template

[[autodoc]] clone_chat_template
Expand Down
111 changes: 111 additions & 0 deletions docs/source/chat_templates.md

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I'd suggest a few modifications, flagging the rationale:

Reading the original intro cold, I felt it opened mid-thought: it jumped straight into "they serve two purposes: identity comparison and training patches", which is framed from the implementer's side (ie, why these files exist internally) rather than the reader's (=user) side: "why do I, a TRL user, care?". I reworked it along this arc:

  • Define what a chat template is (+ link to transformers docs, tiny apply_chat_template example).
  • Reassure: most users never touch them; TRL handles it transparently.
  • Motivate the page with the two user-facing scenarios (SFT -assistant_only_loss=True`, GRPO tool calls).
  • Say what to do: TRL auto-patches supported families; for others, you patch yourself.

Collapsed "Original templates" into "Supported model families". The per-file stubs ("Original Qwen3 chat template.", ...) weren't user-facing: those originals exist for TRL's internal identity-comparison; a user would never use them directly -> collapsed to a one-line family list

Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Chat Templates

TRL ships a small collection of Jinja2 chat templates under [`trl/chat_templates/`](https://github.com/huggingface/trl/tree/main/trl/chat_templates). They serve two purposes:

1. **Identity comparison**: detecting which model is being used (by comparing `processing_class.chat_template` against known templates) to add the appropriate response schema ([`add_response_schema`]) or swap in a training template ([`get_training_chat_template`]).

@qgallouedec qgallouedec Apr 17, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ([`add_response_schema`])
+ (`add_response_schema`)

now that it's not in the doc anymore

2. **Training patches**: modified templates that fix training-specific issues (prefix-preservation for GRPO, `{% generation %}` markers for SFT assistant-only loss).

**Why prefix-preserving?** The GRPO tool call loop extracts tool response formatting tokens by comparing tokenizations with and without tool messages appended (`_get_tool_suffix_ids`). This requires the chat template to be *prefix-preserving*: appending messages must not change how earlier messages are rendered.

**Why generation-tagged?** SFT with `assistant_only_loss=True` requires the chat template to include `{% generation %}` / `{% endgeneration %}` markers around assistant output, so `return_assistant_tokens_mask=True` can produce correct masks. Most model templates don't include these markers natively.

## Original templates

Used for identity comparison only.

### `deepseekv3.jinja`

Original DeepSeek-V3 chat template.

### `glm4moe.jinja`

Original GLM-4-MoE chat template.

### `gptoss.jinja`

Original GPT-OSS chat template.

### `llama3.jinja`

Original Llama 3 chat template.

### `llama3_1.jinja` / `llama3_2.jinja`

Original Llama 3.1 / 3.2 chat templates. Both render tool calls as a single bare JSON object using the key `parameters` (instead of `arguments`) and support at most one tool call per assistant turn.

### `qwen2_5.jinja`

Original Qwen2.5 chat template.

### `qwen3.jinja`

Original Qwen3 chat template.

### `qwen3_vl.jinja`

Original Qwen3-VL chat template. Unlike text-only Qwen3, this template is already prefix-preserving (no conditional thinking blocks), so no training patch is needed.

### `qwen3_5_2b_and_below.jinja` / `qwen3_5_4b_and_above.jinja`

Original Qwen3.5 chat templates.
Comment thread
sergiopaniego marked this conversation as resolved.
Outdated

## Training templates

Patched templates that fix training-specific issues. Swapped in at init when tools are enabled (GRPO) or when `assistant_only_loss=True` (SFT).

### `deepseekv3_training.jinja`

Patched DeepSeek-V3 template. Diff vs `deepseekv3.jinja`:

- Uses `| tojson` on `tool['function']['arguments']` so that `arguments` can be passed as a `dict` (the documented format per [transformers docs](https://huggingface.co/docs/transformers/en/chat_extras#tool-calling-example)). The original template uses raw string concatenation, which crashes on dict inputs.
- Wraps assistant message output with `{% generation %}` / `{% endgeneration %}` markers for SFT assistant-only loss.

### `qwen3_training.jinja`

Patched Qwen3 template. Diff vs `qwen3.jinja`:

Require both `<think>` and `</think>` to be present before parsing, to avoid incorrect splitting when the model generates only one tag:

```diff
- {%- if '</think>' in content %}
+ {%- if '<think>' in content and '</think>' in content %}
```

Always include the thinking block regardless of message position. The original conditionally omits it based on `loop.last`, which changes the assistant rendering when a tool message is appended, breaking prefix-preservation:

```diff
- {%- if loop.index0 > ns.last_query_index %}
- {%- if loop.last or (not loop.last and reasoning_content) %}
- {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
- {%- else %}
- {{- '<|im_start|>' + message.role + '\n' + content }}
- {%- endif %}
- {%- else %}
- {{- '<|im_start|>' + message.role + '\n' + content }}
- {%- endif %}
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
```

Wrap assistant message output with `&#123;% generation %&#125;` / `&#123;% endgeneration %&#125;` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss.

### `gptoss_training.jinja`

Patched GPT-OSS template. Diff vs `gptoss.jinja`:

Wrap assistant message output with `&#123;% generation %&#125;` / `&#123;% endgeneration %&#125;` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss.

### `llama3_training.jinja`

Patched Llama 3 template. Diff vs `llama3.jinja`:

Wrap assistant message output with `&#123;% generation %&#125;` / `&#123;% endgeneration %&#125;` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss.

### `qwen2_5_training.jinja`

Patched Qwen2.5 template. Diff vs `qwen2_5.jinja`:

Wrap assistant message output with `&#123;% generation %&#125;` / `&#123;% endgeneration %&#125;` so that `return_assistant_tokens_mask=True` produces correct masks for SFT assistant-only loss.

## Related utilities

See [Chat Template Utilities](chat_template_utils) for the helper functions (`clone_chat_template`, `is_chat_template_prefix_preserving`, `get_training_chat_template`) that operate on these templates.
Comment thread
sergiopaniego marked this conversation as resolved.
Outdated
3 changes: 3 additions & 0 deletions docs/source/grpo_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,9 @@ trainer = GRPOTrainer(
Each tool must be a standard Python function with **type-hinted arguments and return types**, along with a **Google-style docstring** describing its purpose, arguments, and return value.
For more details, see the [Passing tools guide](https://huggingface.co/docs/transformers/en/chat_extras#passing-tools).

> [!TIP]
> The GRPO tool call loop requires the chat template to be *prefix-preserving* (appending messages must not change how earlier messages are rendered). For known model families (e.g. Qwen3, DeepSeek-V3), TRL automatically swaps in a patched training template when tools are enabled. See [Chat Templates](chat_templates#training-templates) for the full list.
Comment thread
sergiopaniego marked this conversation as resolved.
Outdated

Example:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/sft_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ training_args = SFTConfig(assistant_only_loss=True)
![train_on_assistant](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/train_on_assistant.png)

> [!WARNING]
> This functionality requires the chat template to include `&#123;% generation %&#125;` and `&#123;% endgeneration %&#125;` keywords. For known model families (e.g. Qwen3), TRL automatically patches the template when `assistant_only_loss=True`. For other models, check that your chat template includes these keywords — see [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B/blob/main/chat_template.jinja#L76-L82) for an example.
> This functionality requires the chat template to include `&#123;% generation %&#125;` and `&#123;% endgeneration %&#125;` keywords. For known model families (e.g. Qwen3), TRL automatically patches the template when `assistant_only_loss=True`. See [Chat Templates](chat_templates#training-templates) for the full list of bundled training templates. For other models, check that your chat template includes these keywords. See [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B/blob/main/chat_template.jinja#L76-L82) for an example.

### Train on completion only

Expand Down
Loading