Skip to content

Enable saving and loading precomputed reference log probabilities in …#3986

Closed
ginkyenglee wants to merge 4 commits intohuggingface:mainfrom
ginkyenglee:feature/dpo-cached-ref-logps
Closed

Enable saving and loading precomputed reference log probabilities in …#3986
ginkyenglee wants to merge 4 commits intohuggingface:mainfrom
ginkyenglee:feature/dpo-cached-ref-logps

Conversation

@ginkyenglee
Copy link
Contributor

  • Added two new DPOConfig arguments: load_ref_logps_dir and save_ref_logps_dir
  • Allows DPOTrainer to reuse precomputed ref_chosen_logps and ref_rejected_logps from disk
  • Saves new log probabilities during training/evaluation when configured
  • Added safeguards for distributed training (save only on main process, synchronize with wait_for_everyone)
  • Improves reproducibility and avoids expensive recomputation across runs

What does this PR do?

Fixes #3985

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • [] Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…DPOTrainer

- Added two new DPOConfig arguments: `load_ref_logps_dir` and `save_ref_logps_dir`
- Allows DPOTrainer to reuse precomputed `ref_chosen_logps` and `ref_rejected_logps` from disk
- Saves new log probabilities during training/evaluation when configured
- Added safeguards for distributed training (save only on main process, synchronize with wait_for_everyone)
- Improves reproducibility and avoids expensive recomputation across runs
@qgallouedec
Copy link
Member

Thanks for the PR — I completely understand the need.

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.

That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

  • Merge it, but include a warning that this argument will be removed at any time without notice.
  • Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

@edbeeching
Copy link
Collaborator

Thanks for the PR — I completely understand the need.

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.

That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

  • Merge it, but include a warning that this argument will be removed at any time without notice.
  • Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

I think it would be ok as long as there is a warning. Perhaps you could just have a precomputed_logprob_dir rather than both the save/load args. Then if that file exists, load the ref_logprobs. If it does not exist, compute the ref_logprobs and save them.

@ginkyenglee
Copy link
Contributor Author

Thanks for the PR — I completely understand the need.
datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment. I’m a bit hesitant to support the feature in its current form because, while useful, it depends on a workaround rather than properly leveraging the caching system in datasets.
That said, it could still bring value to the community. While we wait for a proper fix, I see two options:

  • Merge it, but include a warning that this argument will be removed at any time without notice.
  • Leave the PR open in the meantime, and reference it in the documentation.

what do you think @albertvillanova @lewtun @kashif @edbeeching?

I think it would be ok as long as there is a warning. Perhaps you could just have a precomputed_logprob_dir rather than both the save/load args. Then if that file exists, load the ref_logprobs. If it does not exist, compute the ref_logprobs and save them.

Thanks for the comments!

Proposal: make precomputed_logprob_dir the single switch.

  • If None → no precompute.
  • If set → load cache if present; otherwise compute+save in the same run.
    I’ll deprecate (and ignore when dir is set) precompute_ref_log_probs.

Rationale: simpler, single source of truth and aligns with the earlier “one dir + warning” suggestion.

Does this look good?

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasets already provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment.

Hi, if you think there is an underlying issue with datasets for this use case, I could have a look. I can try to identify whether the fix should happen upstream in the datasets lib, or on our side (if it is a misuse).

@qgallouedec
Copy link
Member

closing in favour of #3906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Save/Load Precomputed Ref Log-Probabilities in DPOTrainer

4 participants

Comments