Enable saving and loading precomputed reference log probabilities in … by ginkyenglee · Pull Request #3986 · huggingface/trl

ginkyenglee · 2025-09-01T07:41:21Z

Added two new DPOConfig arguments: load_ref_logps_dir and save_ref_logps_dir
Allows DPOTrainer to reuse precomputed ref_chosen_logps and ref_rejected_logps from disk
Saves new log probabilities during training/evaluation when configured
Added safeguards for distributed training (save only on main process, synchronize with wait_for_everyone)
Improves reproducibility and avoids expensive recomputation across runs

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
[] Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…DPOTrainer - Added two new DPOConfig arguments: `load_ref_logps_dir` and `save_ref_logps_dir` - Allows DPOTrainer to reuse precomputed `ref_chosen_logps` and `ref_rejected_logps` from disk - Saves new log probabilities during training/evaluation when configured - Added safeguards for distributed training (save only on main process, synchronize with wait_for_everyone) - Improves reproducibility and avoids expensive recomputation across runs

qgallouedec · 2025-09-03T00:16:48Z

Thanks for the PR — I completely understand the need.

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.

That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

Merge it, but include a warning that this argument will be removed at any time without notice.
Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

edbeeching · 2025-09-09T07:09:24Z

Thanks for the PR — I completely understand the need.

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.

That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

Merge it, but include a warning that this argument will be removed at any time without notice.

Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

I think it would be ok as long as there is a warning. Perhaps you could just have a precomputed_logprob_dir rather than both the save/load args. Then if that file exists, load the ref_logprobs. If it does not exist, compute the ref_logprobs and save them.

ginkyenglee · 2025-09-10T02:14:16Z

Thanks for the PR — I completely understand the need.
datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.
That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

Merge it, but include a warning that this argument will be removed at any time without notice.

Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

I think it would be ok as long as there is a warning. Perhaps you could just have a precomputed_logprob_dir rather than both the save/load args. Then if that file exists, load the ref_logprobs. If it does not exist, compute the ref_logprobs and save them.

Thanks for the comments!

Proposal: make precomputed_logprob_dir the single switch.

If None → no precompute.
If set → load cache if present; otherwise compute+save in the same run.
I’ll deprecate (and ignore when dir is set) precompute_ref_log_probs.

Rationale: simpler, single source of truth and aligns with the earlier “one dir + warning” suggestion.

Does this look good?

albertvillanova

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment.

Hi, if you think there is an underlying issue with datasets for this use case, I could have a look. I can try to identify whether the fix should happen upstream in the datasets lib, or on our side (if it is a misuse).

qgallouedec · 2026-02-16T04:37:34Z

closing in favour of #3906

ginkyenglee added 2 commits September 1, 2025 06:59

Merge branch 'main' into feature/dpo-cached-ref-logps

e04450c

Merge branch 'main' into feature/dpo-cached-ref-logps

7e0613f

Merge main into feature/dpo-cached-ref-logps (sync on 2025-09-10)

09f7bff

albertvillanova reviewed Sep 10, 2025

View reviewed changes

qgallouedec closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable saving and loading precomputed reference log probabilities in …#3986

Enable saving and loading precomputed reference log probabilities in …#3986
ginkyenglee wants to merge 4 commits intohuggingface:mainfrom
ginkyenglee:feature/dpo-cached-ref-logps

ginkyenglee commented Sep 1, 2025

Uh oh!

qgallouedec commented Sep 3, 2025

Uh oh!

edbeeching commented Sep 9, 2025

Uh oh!

ginkyenglee commented Sep 10, 2025

Uh oh!

albertvillanova left a comment

Uh oh!

qgallouedec commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

ginkyenglee commented Sep 1, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Sep 3, 2025

Uh oh!

edbeeching commented Sep 9, 2025

Uh oh!

ginkyenglee commented Sep 10, 2025

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments