Refactor KTO [1/N]: Modernize model initialization by albertvillanova · Pull Request #4783 · huggingface/trl

albertvillanova · 2026-01-07T17:34:41Z

Refactor KTO [1/N]: Modernize model initialization.

This PR modernizes KTOTrainer's model initialization to align with SFTTrainer's clean and maintainable patterns. It replaces manual model loading with the create_model_from_path() helper function.

Part of:

KTO refactoring #4786

Problem

Before (KTO):

Manual handling of model_init_kwargs and ref_model_init_kwargs (43 lines)
Manual dtype conversion with getattr(torch, dtype)
Manual device_map setting
Duplicate code for model and ref_model
Direct calls to AutoModelForCausalLM.from_pretrained
Hard errors instead of warnings for already-instantiated models

After (Aligned with SFT):

Clean kwargs handling with or {} pattern
Automatic dtype conversion via helper
DeepSpeed/MULTI_GPU device_map handling
Single call to create_model_from_path helper
User-friendly warnings

HuggingFaceDocBuilderDev · 2026-01-07T17:37:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-01-07T18:23:16Z

Nice! If it's easier for you, I think it's fine to have a big refactoring PR like in #3906

qgallouedec · 2026-01-07T18:30:26Z

trl/experimental/kto/kto_trainer.py

+        # Reference model initialization
        if isinstance(ref_model, str):
-            ref_model = AutoModelForCausalLM.from_pretrained(ref_model, **ref_model_init_kwargs)
+            ref_model_init_kwargs = args.ref_model_init_kwargs or {}
+            # Distributed training requires device_map=None
+            if args.distributed_state.distributed_type in ["MULTI_GPU", "DEEPSPEED"]:
+                ref_model_init_kwargs["device_map"] = None
+            ref_model = create_model_from_path(ref_model, **ref_model_init_kwargs)
+        else:
+            if ref_model is not None and args.ref_model_init_kwargs is not None:
+                logger.warning(
+                    "You passed `ref_model_init_kwargs` to the KTOConfig, but your ref_model is already instantiated. "
+                    "The `ref_model_init_kwargs` will be ignored."
+                )


In GRPO/RLOO/DPO refactored, the ref model if loaded after super().__init__(...), but we can still align later

Thank you for catching this: that's indeed better architecture.

I agree we can align this later, as I was planning to do on the phase 3 refactoring plan: Reference Model Handling specifically planned for ref_model improvements.

albertvillanova · 2026-01-08T06:00:03Z

If it's easier for you, I think it's fine to have a big refactoring PR like in #3906

Thanks for your suggestion, but I would prefer to keep the PRs small for review quality and risk management. Each PR is independently valuable and can be reviewed in 15-30 minutes.

While a big refactoring PR sounds efficient, I think it creates high risk, poor review quality, slower iteration, and harder debugging. Indeed, I am already finding difficult to resolve conflicts each time I am merging the main branch to this other PR: #4700.

IMO, small PRs are better for quality, speed, and maintainability.

Happy to discuss if you have concerns about the granularity! 😅

Modernize KTO model initialization

b721292

qgallouedec reviewed Jan 7, 2026

View reviewed changes

qgallouedec approved these changes Jan 7, 2026

View reviewed changes

albertvillanova mentioned this pull request Jan 8, 2026

KTO refactoring #4786

Open

6 tasks

albertvillanova merged commit 1a93971 into huggingface:main Jan 8, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor KTO [1/N]: Modernize model initialization#4783

Refactor KTO [1/N]: Modernize model initialization#4783
albertvillanova merged 1 commit intohuggingface:mainfrom
albertvillanova:refactor-kto-1c

albertvillanova commented Jan 7, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 7, 2026

Uh oh!

qgallouedec commented Jan 7, 2026

Uh oh!

qgallouedec Jan 7, 2026

Uh oh!

albertvillanova Jan 8, 2026

Uh oh!

albertvillanova commented Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Uh oh!

HuggingFaceDocBuilderDev commented Jan 7, 2026

Uh oh!

qgallouedec commented Jan 7, 2026

Uh oh!

qgallouedec Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Jan 7, 2026 •

edited

Loading

albertvillanova commented Jan 8, 2026 •

edited

Loading