-
Notifications
You must be signed in to change notification settings - Fork 309
feat: preference datasets #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
a38c104
adding support for Bradley-Terry reward model training
jveronvialard ede515b
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/b…
jveronvialard 5b9e976
update docs
jveronvialard 68e96ea
add separate run_rm.py and unit tests
jveronvialard 21d67a0
fix small typos and nit changes
jveronvialard 0aff450
adding generic preference dataset class and support for multiple vali…
jveronvialard 8a28af7
rewards tensor shape
jveronvialard 7de3b93
adding unit tests
jveronvialard 63dd1f3
updating docs
jveronvialard e914087
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/b…
jveronvialard 8fb280b
update config and skip is_tied_lm_head for RM
jveronvialard 3e3b03a
use tokenizer.pad_token_id if model.config.pad_token_id is not defined
jveronvialard ed24aea
nit
jveronvialard af17314
update functional test and cicd
jveronvialard 1034634
nit docs
jveronvialard 02687ce
keep individual metrics then aggregate on the entire dataset
jveronvialard 24c5fd0
nit code and doc changes
jveronvialard 8788ec2
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/b…
jveronvialard 24807c3
split sft.py and rm.py
jveronvialard 5c76465
nit code and doc changes
jveronvialard 00363a2
pull from target branch
jveronvialard d3b6272
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/b…
jveronvialard 0aaf296
pull from main
jveronvialard 5b3f1ad
Update docs/guides/rm.md
odelalleau 6534c7c
Remove the `-RAY_DEDUP_LOGS=0` examples in the README
odelalleau b79d0ee
Refactor RM config to include a dedicated `reward_model_cfg` section
odelalleau 51cc9f8
Provide user-friendly error message regarding unsupported RMs in mcore
odelalleau 597d5eb
Simplify code and guard against enabling sequence packing in RMs
odelalleau ba2e4b6
Fix likely crash with Reward Models introduced in previous commit
odelalleau 4733717
Fix linting issues
odelalleau 3297cd1
Fix a typing issue
odelalleau 179767e
Quick fix to typing issue (with TODO item for better fix)
odelalleau a86b6c7
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/b…
jveronvialard 2e6ef71
Merge branch 'jveronvialard/bt-rm-training' of github.com:NVIDIA-NeMo…
jveronvialard 76f77d8
unify data logic between DPO and RM training
jveronvialard 97d1c46
pull from main
jveronvialard 6ca4287
nit code and docs
jveronvialard 1894caf
put data processing in collate_fn
jveronvialard 74eb553
updates to val metrics and save state
jveronvialard b3e848e
pull from main
jveronvialard 5aba6d6
pull from main
jveronvialard 2449bba
squash unsigned commits resolving previous feedback
jveronvialard f602042
pull from main
jveronvialard 8ad7565
nit docs + lint
jveronvialard 9efd72a
nit code and docs
jveronvialard 8129c23
better jsonc
jveronvialard c4e3bda
adding overall val time
jveronvialard 578441f
aggregate metrics at the batch level first
jveronvialard 52a20bc
lint
jveronvialard f97ee6d
nit
jveronvialard c4b4e6a
fix tulu3
jveronvialard bd16f9b
adding tulu3 unit test
jveronvialard 5f6cc52
nit
jveronvialard 0dc7a6f
validation metrics
jveronvialard 1137407
nit code and docs
jveronvialard ba4b539
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/p…
jveronvialard 62fc8f1
Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/p…
jveronvialard 30571c2
adding DPOValMetrics
jveronvialard e58d9ee
revert jsonc to json since sphinx didn't like
terrykong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.