Support max_length in DPO VLM training by albertvillanova · Pull Request #5284 · huggingface/trl

albertvillanova · 2026-03-13T07:16:46Z

Support max_length in DPO VLM training.

Truncate sequence-aligned side-inputs (token_type_ids, mm_token_type_ids) with input_ids in DPO VLM training

This PR addresses a regression affecting vision-language model (VLM) training when using sequence truncation. The main fix ensures that auxiliary token fields (mm_token_type_ids and token_type_ids) are truncated in sync with input_ids, preventing shape mismatches and crashes during the model's forward pass. Additionally, a regression test is added to verify this behavior.

Changes

Bug fix for sequence truncation in VLMs:

Ensured that token_type_ids and mm_token_type_ids are truncated to match the length of input_ids in both compute_ref_log_probs and _compute_loss methods of DPOTrainer, preventing shape mismatch errors during training.
- Note that pixel_values, image_grid_thw, image_sizes, and pixel_attention_mask are patch-level or image-level tensors and should not be truncated.

Testing improvements:

Added a regression test test_train_vlm_with_max_length in tests/test_dpo_trainer.py to verify that truncation with max_length does not crash the model and that image tokens are handled correctly.

Follow-up

If this approach is approved, I will implement a similar fix for other trainers.

-        input_ids, attention_mask, completion_mask = self._truncate_inputs(input_ids, attention_mask, completion_mask)
+        # token_type_ids is sequence-length-aligned: truncate to match input_ids
+        # in keep_end mode, token_type_ids participates in flush_right/flush_left
+        extra = (inputs["token_type_ids"],) if "token_type_ids" in inputs else ()


why not having mm_token_type_ids in extra?

The tokens in extra can be truncated both with "keep_start" and "keep_end", and I think it is semantically wrong to use "keep_end" in VLM mm_token_type_ids, but I'm addressing that in a following PR:

Prevent corruption of DPO VLM training if "keep_end" truncation_mode #5286

So, let's treat mm_token_type_ids and token_type_ids symmetrically to be internally consistent, and leave the semantical correction to the other PR.

ok, I think you should align compute_ref_log_probs with compute_loss, ie having mm_token_type_ids in extra in both cases

Sure, thanks! 😅

commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

albertvillanova added 2 commits March 13, 2026 08:10

Test train VLM with max_length

b53b665

Truncate token_type_ids and mm_token_type_ids

9fc6abf

albertvillanova changed the title ~~Truncate token_type_ids and mm_token_type_ids with input_ids in DPO VLM training~~ Support max_length in DPO VLM training Mar 13, 2026

cursor Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py Outdated

albertvillanova added 3 commits March 13, 2026 09:34

Extended _truncate_inputs with *extra

4ff5cb2

Pass token_type_ids as extra into _truncate_inputs

face162

Merge remote-tracking branch 'upstream/main' into fix-5283

6859d76

qgallouedec reviewed Mar 13, 2026

View reviewed changes

albertvillanova added 5 commits March 16, 2026 08:14

Align mm_token_type_ids with token_type_ids

3eb6711

Merge remote-tracking branch 'upstream/main' into fix-5283

0e4056b

Merge remote-tracking branch 'upstream/main' into fix-5283

aba2c14

Merge remote-tracking branch 'upstream/main' into fix-5283

79d28c4

Align mm_token_type_ids with token_type_ids in _compute_loss

38b4bc6

qgallouedec approved these changes Mar 17, 2026

View reviewed changes

albertvillanova merged commit 3acb8e8 into huggingface:main Mar 17, 2026
12 checks passed

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Support max_length in DPO VLM training (huggingface#5284)

58961c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support max_length in DPO VLM training#5284

Support max_length in DPO VLM training#5284
albertvillanova merged 10 commits into
huggingface:mainfrom
albertvillanova:fix-5283

albertvillanova commented Mar 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec Mar 13, 2026

Uh oh!

albertvillanova Mar 16, 2026

Uh oh!

qgallouedec Mar 16, 2026

Uh oh!

albertvillanova Mar 17, 2026

Uh oh!

albertvillanova Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Follow-up

Related

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 13, 2026 •

edited by cursor Bot

Loading