Fix SFT loss type rewards being overwritten in dpo_loss() by Mr-Neutr0n · Pull Request #5079 · huggingface/trl

Mr-Neutr0n · 2026-02-11T18:35:55Z

Summary

In dpo_loss(), the sft loss branch correctly sets chosen_rewards and rejected_rewards to zeros (since SFT has no preference rewards), but the unconditional reward computation at the end of the method overwrites these with standard DPO implicit rewards
This causes inflated reward metrics when using SFT as part of a multi-loss configuration (e.g., MPO with ["sigmoid", "bco_pair", "sft"]), because the SFT component contributes DPO-style rewards instead of the intended zeros
The fix guards the reward computation so it only runs for non-SFT loss types, preserving the correct zero rewards for SFT

Test plan

Existing test_train_with_multiple_loss_types test continues to pass
Run MPO training with loss_type=["sigmoid", "sft"] and verify reward metrics are no longer inflated by the SFT component
Verify single loss type training (e.g., loss_type="sigmoid") is unaffected

In dpo_loss(), the SFT loss branch correctly sets chosen_rewards and rejected_rewards to zeros since SFT has no preference rewards. However, the unconditional reward computation at the end of the method overwrites these zeros with standard DPO implicit rewards. This causes incorrect reward metrics when using SFT as part of a multi-loss configuration (e.g., MPO with ["sigmoid", "bco_pair", "sft"]). The accumulated rewards are inflated because the SFT component contributes DPO-style rewards instead of zeros, leading to misleading reward_accuracies and reward margin metrics. The fix guards the reward computation so it only applies to non-SFT loss types, preserving the intended zero rewards for SFT.

qgallouedec

thanks, this makes sense. Note that DPO is being refactored in #3906, I'm checking if we're not doing the same mistake.

HuggingFaceDocBuilderDev · 2026-02-16T17:18:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec self-assigned this Feb 11, 2026

qgallouedec approved these changes Feb 16, 2026

View reviewed changes

qgallouedec merged commit 8c232f6 into huggingface:main Feb 16, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SFT loss type rewards being overwritten in dpo_loss()#5079

Fix SFT loss type rewards being overwritten in dpo_loss()#5079
qgallouedec merged 1 commit intohuggingface:mainfrom
Mr-Neutr0n:fix/dpo-sft-reward-overwrite

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Test plan

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments