fix: Remove chat template setting from non-SFT trainer scripts by behroozazarkhalili · Pull Request #4437 · huggingface/trl

behroozazarkhalili · 2025-11-03T03:39:21Z

Summary

Resolves #4404

This PR removes the SIMPLE_CHAT_TEMPLATE import and chat template setting from all non-SFT trainer scripts. Setting chat templates only makes sense for SFT (supervised fine-tuning/instruction tuning), not for preference optimization or reward-based training methods.

Changes

Removed chat template setting from:

examples/scripts/online_dpo.py - Online DPO (preference optimization)
examples/scripts/orpo.py - ORPO (preference optimization)
examples/scripts/cpo.py - CPO (preference optimization)
examples/scripts/nash_md.py - Nash-MD (multi-objective RL)
examples/scripts/xpo.py - XPO (preference optimization)
examples/scripts/ppo/ppo.py - PPO (reward-based training)
examples/scripts/ppo/ppo_tldr.py - PPO TLDR (reward-based training)

For each script:

Removed from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE import
Removed conditional block that sets tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE

Rationale

Chat templates are used to format conversational data for instruction tuning (SFT). They don't apply to:

Preference optimization (DPO, CPO, ORPO, XPO): These methods optimize based on preference pairs, not conversational format
Reward-based training (PPO, Nash-MD): These use reward signals, not chat formatting

Setting chat templates in these contexts was unnecessary and could cause confusion about the expected data format.

Resolves #4404 - Remove SIMPLE_CHAT_TEMPLATE import from 7 trainer scripts - Remove chat template setting for non-SFT trainers (DPO, CPO, ORPO, PPO, Nash-MD, XPO, Online DPO) - Chat templates only make sense for SFT (instruction tuning), not for preference optimization or reward-based training - Scripts modified: - examples/scripts/online_dpo.py - examples/scripts/orpo.py - examples/scripts/cpo.py - examples/scripts/nash_md.py - examples/scripts/xpo.py - examples/scripts/ppo/ppo.py - examples/scripts/ppo/ppo_tldr.py

HuggingFaceDocBuilderDev · 2025-11-03T03:42:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec

lgtm!

commit 4677cf2 Author: Harras Mansoor <98635627+Harras3@users.noreply.github.com> Date: Wed Nov 5 04:06:13 2025 +0500 Removed Sentiment Tuning Examples (#4424) commit 7a9592b Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Tue Nov 4 14:32:04 2025 -0700 🐍 Drop Python 3.9 (#4183) commit 7f15a7f Author: Harras Mansoor <98635627+Harras3@users.noreply.github.com> Date: Wed Nov 5 02:06:31 2025 +0500 Removed outdated warning about batch contamination (#4423) commit 8b0a3ce Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Nov 4 21:37:39 2025 +0100 Update tokenizer apply_chat_template with return_dict=True default (#4448) commit d9f9e2b Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Tue Nov 4 19:56:58 2025 +0000 Support casting to fp32 when word embeddings are tied to lm_head (#4446) commit 4e138ab Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Date: Tue Nov 4 15:15:23 2025 +0100 Upload notebook with T4 selected (#4449) commit 43253b2 Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com> Date: Mon Nov 3 21:07:31 2025 +0000 Add On-Policy Distillation from thinking labs to paper index. (#4410) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 6f41b18 Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com> Date: Mon Nov 3 10:57:51 2025 -0800 fix: Remove chat template setting from non-SFT trainer scripts (#4437) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

qgallouedec and others added 3 commits November 3, 2025 15:26

same for test

ee5827c

Merge branch 'main' into fix/remove-chat-template-non-sft

fb925a3

Merge branch 'main' into fix/remove-chat-template-non-sft

2a05f11

qgallouedec approved these changes Nov 3, 2025

View reviewed changes

Merge branch 'main' into fix/remove-chat-template-non-sft

e5fcb01

behroozazarkhalili merged commit 6f41b18 into main Nov 3, 2025
11 of 12 checks passed

behroozazarkhalili deleted the fix/remove-chat-template-non-sft branch November 3, 2025 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove chat template setting from non-SFT trainer scripts#4437

fix: Remove chat template setting from non-SFT trainer scripts#4437
behroozazarkhalili merged 5 commits intomainfrom
fix/remove-chat-template-non-sft

behroozazarkhalili commented Nov 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

behroozazarkhalili commented Nov 3, 2025

Summary

Changes

Rationale

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants