Align KTO with DPO: Align _precompute_ref_logps by albertvillanova · Pull Request #5714 · huggingface/trl

albertvillanova · 2026-05-06T14:28:48Z

Align KTO with DPO: Align _precompute_ref_logps.

This PR introduces a caching mechanism for precomputing reference log probabilities in the KTOTrainer, which significantly improves efficiency by avoiding redundant computations. The main changes involve adding new imports, integrating a hash-based cache file system using numpy, and updating the dataset with cached results.

Part of:

KTO refactoring #4786

Changes

Caching and efficiency improvements:

Added logic to cache computed reference log probabilities (reference_logps and reference_KL_logps) to a numpy .npz file, identified by a hash of the dataset and model. On subsequent runs, the code loads from cache if available, reducing redundant computation.
Used hash_module and datasets.fingerprint.Hasher to generate unique cache fingerprints based on the model and dataset, ensuring cache correctness.
Updated dataset columns with cached or newly computed log probabilities, and ensured new fingerprints are set for reproducibility.

Dependency and import updates:

Added imports for os, numpy, and hash_module to support caching and hashing functionality.

These changes collectively improve the performance and reproducibility of the reference log probability computation process.

Note

Medium Risk
Adds cross-run caching and fingerprint manipulation in KTOTrainer._precompute_ref_logps, which can affect training correctness if the cache key or synchronization is wrong and introduces file I/O in distributed runs.

Overview
KTOTrainer._precompute_ref_logps now caches precomputed reference log probabilities to a compressed .npz file keyed by a fingerprint of the dataset, the (ref) model weights (hash_module), and whether KL is computed.

On subsequent runs it loads reference_logps (and reference_KL_logps when enabled) from cache instead of recomputing, and updates dataset columns while setting new_fingerprint to reflect the cached content for reproducibility across runs/processes.

^{Reviewed by Cursor Bugbot for commit a396005. Bugbot is set up for automated code reviews on this repo. Configure here.}

…mpute_ref_logps

HuggingFaceDocBuilderDev · 2026-05-06T14:31:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d4484bc. Configure here.}

qgallouedec

It looks good!

…mpute_ref_logps

albertvillanova added 4 commits May 6, 2026 16:08

Rename variables to align with column names

64ef6c2

Add cache mechanism

ac002f9

Align logic

a7c43a7

Merge remote-tracking branch 'upstream/main' into align-kto-dpo-preco…

c2af3d4

…mpute_ref_logps

Use gather_for_metrics with tuple

d4484bc

cursor Bot reviewed May 6, 2026

View reviewed changes

Comment thread trl/experimental/kto/kto_trainer.py Outdated

Comment thread trl/experimental/kto/kto_trainer.py Outdated

albertvillanova added 2 commits May 6, 2026 16:43

Use np.savez_compressed

9180b1c

Add self.calculate_KL to hash fingerprint

3ba521a

qgallouedec approved these changes May 7, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into align-kto-dpo-preco…

a396005

…mpute_ref_logps

albertvillanova merged commit acbd53f into main May 7, 2026
5 checks passed

albertvillanova deleted the align-kto-dpo-precompute_ref_logps branch May 7, 2026 06:08

himanshushukla12 mentioned this pull request May 7, 2026

Sync fork with upstream huggingface/trl (1307 commits) himanshushukla12/trl#1

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align KTO with DPO: Align _precompute_ref_logps#5714

Align KTO with DPO: Align _precompute_ref_logps#5714
albertvillanova merged 8 commits into
mainfrom
align-kto-dpo-precompute_ref_logps

albertvillanova commented May 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented May 6, 2026 •

edited by cursor Bot

Loading