Enable saving and loading precomputed reference log probabilities in …#3986
Enable saving and loading precomputed reference log probabilities in …#3986ginkyenglee wants to merge 4 commits intohuggingface:mainfrom
Conversation
…DPOTrainer - Added two new DPOConfig arguments: `load_ref_logps_dir` and `save_ref_logps_dir` - Allows DPOTrainer to reuse precomputed `ref_chosen_logps` and `ref_rejected_logps` from disk - Saves new log probabilities during training/evaluation when configured - Added safeguards for distributed training (save only on main process, synchronize with wait_for_everyone) - Improves reproducibility and avoids expensive recomputation across runs
|
Thanks for the PR — I completely understand the need.
That said, it could still bring value to the community. While we wait for a proper fix, I see two options:
what do you think @albertvillanova @lewtun @kashif @edbeeching? |
I think it would be ok as long as there is a warning. Perhaps you could just have a |
Thanks for the comments! Proposal: make
Rationale: simpler, single source of truth and aligns with the earlier “one dir + warning” suggestion. Does this look good? |
albertvillanova
left a comment
There was a problem hiding this comment.
datasetsalready provides a caching mechanism intended to handle this exact use case. However, it doesn’t seem to be working as expected here, and I don’t see an obvious fix at the moment.
Hi, if you think there is an underlying issue with datasets for this use case, I could have a look. I can try to identify whether the fix should happen upstream in the datasets lib, or on our side (if it is a misuse).
|
closing in favour of #3906 |
load_ref_logps_dirandsave_ref_logps_dirref_chosen_logpsandref_rejected_logpsfrom diskWhat does this PR do?
Fixes #3985
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.