Skip to content

Comments

[DPO] add reference log-prob outputs in DPO#521

Merged
Tcc0403 merged 6 commits intolinkedin:mainfrom
kashif:dpo-fix
Jan 30, 2025
Merged

[DPO] add reference log-prob outputs in DPO#521
Tcc0403 merged 6 commits intolinkedin:mainfrom
kashif:dpo-fix

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Jan 14, 2025

Summary

Since the DPO uses a reference model we also need to return the reference logprobs in DPO

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

austin362667
austin362667 previously approved these changes Jan 18, 2025
Copy link
Contributor

@austin362667 austin362667 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

Copy link
Collaborator

@Tcc0403 Tcc0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tcc0403 Tcc0403 merged commit eed8af3 into linkedin:main Jan 30, 2025
2 of 5 checks passed
@Tcc0403 Tcc0403 mentioned this pull request Feb 1, 2025
3 tasks
austin362667 added a commit that referenced this pull request Feb 9, 2025
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
#521 introduced a unit test failure caused by the numerical issue in
bf16.

This PR relaxes the atol for `chosen_rewards` and `rejected_rewards`
only.
Note: It's just a temporary fix for ci. 

This PR also removes duplicated calculations on both terms.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

---------

Co-authored-by: Austin Liu <austin362667@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants