Skip to content

feat: preference datasets#673

Merged
terrykong merged 59 commits intomainfrom
jveronvialard/preference-datasets
Aug 30, 2025
Merged

feat: preference datasets#673
terrykong merged 59 commits intomainfrom
jveronvialard/preference-datasets

Conversation

@jveronvialard
Copy link
Contributor

@jveronvialard jveronvialard commented Jul 15, 2025

What does this PR do ?

This PR adds a more generic preference dataset class that can be used for both RM and DPO training. It also aligns the RM and DPO training implementations more closely and adds support for multiple validation preference datasets.

Usage

You can specify multiple validation preference datasets in your RM or DPO training configuration:

data:
  dataset_name: PreferenceDataset
  train_data_path: <LocalPathToTrainingDataset>
  val_data_paths:
    <NameOfValidationDataset1>: <LocalPathToValidationDataset1>
    <NameOfValidationDataset2>: <LocalPathToValidationDataset2>
    ...

For example, when using local preference datasets based on HelpSteer2 and HelpSteer3, where ties have been filtered
image
image
image

Comparing RM convergence plots before and after this PR, using the default data.dataset_name: HelpSteer3
image
image

Comparing DPO convergence plots before and after this PR, using the default data.dataset_name: HelpSteer3
image
image
image
image

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
…dation preference datasets

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
@jveronvialard jveronvialard requested a review from odelalleau July 15, 2025 23:26
@jveronvialard jveronvialard added enhancement New feature or request training Training related algorithm labels Jul 15, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 15, 2025
@jveronvialard jveronvialard marked this pull request as draft July 15, 2025 23:30
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! At high level lgtm, just a bunch of mostly minor questions & polish suggestions, except for one point that requires potentially more work (extending this new dataset format to DPO as well).

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor remarks on latest changes

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 52a20bc (PR #673 from jveronvialard/preference-datasets)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker_v2.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Update nemo_rl/models/policy/dtensor_policy_worker_v2.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
@terrykong
Copy link
Collaborator

@jveronvialard and i synced up offline. came to agreement on changes, after changes + convergence metric comparison we should be good to go

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
terrykong
terrykong previously approved these changes Aug 29, 2025
@terrykong terrykong added this pull request to the merge queue Aug 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Aug 30, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added this pull request to the merge queue Aug 30, 2025
Merged via the queue into main with commit cbd4b93 Aug 30, 2025
21 checks passed
@terrykong terrykong deleted the jveronvialard/preference-datasets branch August 30, 2025 21:35
PocketDocLabs referenced this pull request in arcee-ai/NeMo-RL Aug 31, 2025
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

algorithm documentation Improvements or additions to documentation enhancement New feature or request training Training related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants