Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support by albertvillanova · Pull Request #4792 · huggingface/trl

albertvillanova · 2026-01-08T15:24:09Z

Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support.

This cleanup significantly simplifies the KTO trainer and makes subsequent refactoring much easier.

Part of:

KTO refactoring #4786 (comment)

Coordinated with DPO refactoring, as discussed with @qgallouedec :

Refactor DPO #3906

Key Changes

KTOConfig
- Removed is_encoder_decoder parameter and documentation
- Removed max_completion_length parameter (because it is specific to encoder-decoder models) and documentation
KTOTrainer

Initialization:
- Added clear error message when user tries to use encoder-decoder model
- Removed self.is_encoder_decoder attribute initialization
- Removed self.max_completion_length attribute setup
- Hardcoded is_encoder_decoder=False in DPODataCollatorWithPadding call
Data Processing:
- Simplified _process_tokens() function - removed entire encoder-decoder branch (~90 lines)
- Kept only causal LM tokenization logic
Model Forward Pass:
- Simplified get_batch_logps(): removed is_encoder_decoder parameter
- Always shift labels/logits by one position (causal LM only)
- Updated all 4 calling sites to remove the parameter
Reference Model Computation:
- Simplified compute_reference_log_probs() - removed encoder-decoder branches
- Simplified _compute_kl_logps() - removed encoder-decoder conditional
- Simplified forward() - removed encoder-decoder model_kwargs
- Simplified _compute_loss_liger() - removed encoder-decoder branches for hidden states
Error Handling:
- Users attempting to use encoder-decoder models will now receive a clear error
Tests
- Test updated
- Remove commented encoder-decoder tests

HuggingFaceDocBuilderDev · 2026-01-08T15:29:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…nated-with-dpo-a

qgallouedec · 2026-01-08T16:37:19Z

Yes, as @albertvillanova says, I advocated for this change in #3906. It is important change, so I would be interested to hear the opinions of @lewtun, @edbeeching, and @kashif, and ensure we are aligned on this decision for both KTO and DPO.

qgallouedec · 2026-01-08T17:06:50Z

trl/experimental/kto/kto_trainer.py

                pad_token_id=processing_class.pad_token_id,
                label_pad_token_id=args.label_pad_token_id,
-                is_encoder_decoder=self.is_encoder_decoder,
+                is_encoder_decoder=False,


you could probably remove this

Done here and in other places.

…nated-with-dpo-a

edbeeching · 2026-01-09T08:09:17Z

Yes, as @albertvillanova says, I advocated for this change in #3906. It is important change, so I would be interested to hear the opinions of @lewtun, @edbeeching, and @kashif, and ensure we are aligned on this decision for both KTO and DPO.

Yes I agree it is best to streamline the repo and cut features which (presumably) are not widely used.

qgallouedec

a few suggestions, otherwise lgtm!

trl/experimental/kto/kto_trainer.py

tests/experimental/test_kto_trainer.py

albertvillanova added 4 commits January 8, 2026 16:17

Remove is_encoder_decoder from KTOConfig

02ba4c2

Remove max_completion_lengt from KTOConfig

d4a5e96

Remove encoder-decoder from KTOTrainer

c526524

Fix style

252d527

albertvillanova mentioned this pull request Jan 8, 2026

KTO refactoring #4786

Open

6 tasks

albertvillanova added 3 commits January 8, 2026 16:33

Fix test

8c18b8f

Remove commented encoder-decoder tests

fb91f6e

Merge remote-tracking branch 'upstream/main' into refactor-kto-coordi…

6862b6b

…nated-with-dpo-a

qgallouedec requested review from edbeeching, kashif and lewtun January 8, 2026 16:57

qgallouedec reviewed Jan 8, 2026

View reviewed changes

albertvillanova added 2 commits January 8, 2026 19:16

Remove unused is_encoder_decoder kwarg

2beec4d

Merge remote-tracking branch 'upstream/main' into refactor-kto-coordi…

22379be

…nated-with-dpo-a

qgallouedec approved these changes Jan 9, 2026

View reviewed changes

trl/experimental/kto/kto_trainer.py Outdated Show resolved Hide resolved

tests/experimental/test_kto_trainer.py Outdated Show resolved Hide resolved

albertvillanova added 2 commits January 12, 2026 14:11

Merge branch 'main' into refactor-kto-coordinated-with-dpo-a

c73bc76

Remove dead code lines with max_completion_length

3c0c4f2

albertvillanova merged commit 936fd7e into huggingface:main Jan 12, 2026
2 of 3 checks passed

albertvillanova mentioned this pull request Jan 12, 2026

Refactor KTO coordinated with DPO [b/N]: Simplify truncation logic #4808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support#4792

Refactor KTO coordinated with DPO [a/N]: Remove encoder-decoder support#4792
albertvillanova merged 11 commits intohuggingface:mainfrom
albertvillanova:refactor-kto-coordinated-with-dpo-a

albertvillanova commented Jan 8, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

qgallouedec commented Jan 8, 2026

Uh oh!

qgallouedec Jan 8, 2026

Uh oh!

albertvillanova Jan 8, 2026

Uh oh!

edbeeching commented Jan 9, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

albertvillanova commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

qgallouedec commented Jan 8, 2026

Uh oh!

qgallouedec Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

edbeeching commented Jan 9, 2026

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

albertvillanova commented Jan 8, 2026 •

edited

Loading