Skip to content

Remove post-collation truncation from DPO#5350

Merged
albertvillanova merged 6 commits into
huggingface:mainfrom
albertvillanova:fu-5305
Mar 24, 2026
Merged

Remove post-collation truncation from DPO#5350
albertvillanova merged 6 commits into
huggingface:mainfrom
albertvillanova:fu-5305

Conversation

@albertvillanova

@albertvillanova albertvillanova commented Mar 23, 2026

Copy link
Copy Markdown
Member

Remove post-collation truncation from DPO.

This PR removes internal truncation logic from the DPO trainer and require custom data collators to handle truncation themselves. This simplifies the trainer code and clarifies the contract for custom collators. Additionally, the PR updates documentation and error handling to reflect these changes.

Follow-up to:

Motivation

Both built-in DPO collators (DataCollatorForPreference and DataCollatorForVisionPreference) already truncate sequences internally before padding. The only reason _truncate_inputs still existed in the trainer was as a silent safety net for custom collators, which is arguably worse than no safety net, because it hid the fact that the collator wasn't doing its job.

This PR makes the contract explicit and removes the silent fix-up.

Changes

Data Collation and Truncation Handling:

  • Removed the _truncate_inputs method and all related calls, shifting responsibility for truncation entirely to the data collator. Now, if a custom data collator is provided, it must handle truncation before padding.
  • Updated the documentation for the data_collator argument to clearly state that custom collators must truncate sequences before padding, as the trainer will not apply truncation after collation.

Code Simplification:

  • Removed the unused flush_right import and related logic, further simplifying the codebase.

Model Input Handling:

  • Updated how model input arguments are constructed in compute_ref_log_probs and _compute_loss, now directly including all relevant keys from the input dictionary without truncation logic.

Note

Medium Risk
Removes a silent safety net that truncated/padded batches inside DPOTrainer, so custom data_collators that relied on that behavior may now produce overlong or misaligned tensors and fail at runtime.

Overview
Removes post-collation truncation from DPOTrainer. The internal _truncate_inputs path (including keep_end flush/realign logic) is deleted, and both compute_ref_log_probs and loss computation now consume collator outputs as-is.

Updates the trainer/collator contract. The data_collator docstring now explicitly requires custom collators to truncate sequences before padding, and model kwargs assembly is simplified to pass through optional fields (e.g., token_type_ids, multimodal image inputs) directly without length fix-ups.

Written by Cursor Bugbot for commit 2983422. This will update automatically on new commits. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread trl/trainer/dpo_trainer.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread trl/trainer/dpo_trainer.py
Function to use to form a batch from a list of elements of the processed `train_dataset` or `eval_dataset`.
Will default to [`~trainer.dpo_trainer.DataCollatorForPreference`] if the model is a language model and
[`~trainer.dpo_trainer.DataCollatorForVisionPreference`] if the model is a vision-language model.
[`~trainer.dpo_trainer.DataCollatorForVisionPreference`] if the model is a vision-language model. Custom

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add the same comment in SFTTrainer.data_collator

@albertvillanova albertvillanova Mar 24, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to add the same comment to SFT in my subsequent PR, when I remove post-collation truncation from SFT as well.

@qgallouedec

Copy link
Copy Markdown
Member

thanks! I think we might be able to remove flush_right from this repo in a next PR

@qgallouedec

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2983422516

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread trl/trainer/dpo_trainer.py
@albertvillanova albertvillanova merged commit ec1802e into huggingface:main Mar 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants