refactor: refactor env and data processor & add nemotron super 49b recipes #1506

yuki-97 · 2025-11-11T07:53:55Z

Follow up of #1472. Thanks @nv-mmanohara for adding this!

Add GRPO support for HelpSteer3 on LlamaNemotron 49B.
Add SFT support for tulu3 on LlamaNemotron 49B.
Add CodeJaccard environment.
Refactor env and data processor.
Introduce run_grpo.py, will [Refactor] Clear run_grpo_math.py and run_grpo_rm.py #1572 in a subsequent PR.

Test Result

grpo math before and after refactor

nemotron 49B

Known Issue

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face: [BUG] nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face #1571
GRPO Nemotron HelpSteer3 recipe has very high logprob error: [BUG] GRPO Nemotron HelpSteer3 recipe has very high logprob error #1570

Summary by CodeRabbit

New Features
- Added HelpSteer3 and Tulu3 datasets for training and evaluation.
- Introduced CodeJaccard environment for code-based similarity scoring.
- Enhanced data processor registry system with improved task flexibility.
- Added new GRPO training script and SFT configurations for Nemotron-49B models.
- Support for dynamic task naming across datasets.
Documentation
- Updated custom parallel plan paths.
Tests
- Added new GRPO HelpSteer3 and SFT test suites.

… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files. Signed-off-by: ruit <[email protected]>

…or handling - Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration. - Updated RawDataset class to provide a default processor if none is specified in the data configuration. - Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output. Signed-off-by: ruit <[email protected]> Signed-off-by: ruit <[email protected]>

- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling. - Added necessary imports for RawDataset in the respective files. Signed-off-by: ruit <[email protected]>

…up for vlm grpo - Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification. - Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing. Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-26T02:11:38Z

⚠️ File Consistency Check

Check based on commit: 7e1566c (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-26T02:14:04Z

⚠️ File Consistency Check

Check based on commit: 3fb0ec3 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…tion Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-26T10:07:21Z

⚠️ File Consistency Check

Check based on commit: a24ba76 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-26T10:43:54Z

⚠️ File Consistency Check

Check based on commit: 5f8300d (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Yuki Huang <[email protected]>

github-actions · 2025-11-26T13:23:07Z

⚠️ File Consistency Check

Check based on commit: 5f17693 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

yuki-97 · 2025-11-26T13:42:01Z

nemo_rl/data/datasets/raw_dataset.py

+    def __init__(self, data_config: dict, seed: int = 42):
+        self.data_config: dict = data_config
+        self.seed: int = seed
+        self.processor: TaskDataProcessFnCallable | None = None
+        self.task_spec: TaskDataSpec | None = None
+        raise NotImplementedError("__init__ is not implemented")


why not using something like this then do super().__init__(...) in the sub-class?

then self.data_config = data_config at line39, set_processor and set_task_spec in nemo_rl/data/datasets/response_datasets/__init__.py can be removed.

Suggested change

def __init__(self, data_config: dict, seed: int = 42):

self.data_config: dict = data_config

self.seed: int = seed

self.processor: TaskDataProcessFnCallable | None = None

self.task_spec: TaskDataSpec | None = None

raise NotImplementedError("__init__ is not implemented")

def __init__(self, data_config: dict, seed: int = 42):

self.data_config: dict = data_config

self.seed: int = seed

self.processor: TaskDataProcessFnCallable | None = self.get_processor()

self.task_spec: TaskDataSpec | None = self.get_task_spec()

If use super().__init__(...), we need pass data_config when initialize dataset, then all the unit test and other place using dataset need to be modified. It will lead to a huger PR. This part will be done in future PR for ISSUE #1552

joyang-nv · 2025-11-26T14:32:52Z

examples/run_sft.py

 from nemo_rl.utils.logger import get_next_experiment_dir

 OmegaConf.register_new_resolver("mul", lambda a, b: a * b)
+OmegaConf.register_new_resolver("max", lambda a, b: max(a, b))


P1, we should move customized op into a common place in the future.

ACK. I can move it to a new place in the following PR to resolve #1552 .

github-actions bot added the documentation Improvements or additions to documentation label Nov 11, 2025

yuki-97 force-pushed the yukih/pr-1472 branch from 75f3d5c to 5ebbc73 Compare November 11, 2025 07:54

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

yuki-97 temporarily deployed to nemo-ci November 11, 2025 07:56 — with GitHub Actions Inactive

yuki-97 force-pushed the yukih/pr-1472 branch 2 times, most recently from c9335d4 to a872ed6 Compare November 11, 2025 09:27

yuki-97 removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:31 — with GitHub Actions Inactive

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian had a problem deploying to nemo-ci November 16, 2025 03:35 — with GitHub Actions Error

RayenTian force-pushed the yukih/pr-1472 branch 2 times, most recently from b7fedb9 to 9078e33 Compare November 16, 2025 03:37

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:38 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 08:50 — with GitHub Actions Inactive

RayenTian force-pushed the yukih/pr-1472 branch from c0bfaa6 to ab0ac80 Compare November 17, 2025 08:44

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:58 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:59 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 14:21 — with GitHub Actions Inactive

RayenTian added 4 commits November 25, 2025 18:04

RayenTian force-pushed the yukih/pr-1472 branch from 6e4393e to 7e1566c Compare November 26, 2025 02:11

RayenTian force-pushed the yukih/pr-1472 branch from 7e1566c to 3fb0ec3 Compare November 26, 2025 02:13

RayenTian added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 26, 2025

RayenTian temporarily deployed to nemo-ci November 26, 2025 02:15 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 26, 2025 03:43 — with GitHub Actions Inactive

RayenTian mentioned this pull request Nov 26, 2025

[BUG] GRPO Nemotron HelpSteer3 recipe has very high logprob error #1570

Open

Remove unused base model parallel plan from custom parallel configura…

a24ba76

…tion Signed-off-by: ruit <[email protected]>

RayenTian force-pushed the yukih/pr-1472 branch from 3fb0ec3 to a24ba76 Compare November 26, 2025 10:06

RayenTian added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 26, 2025

RayenTian temporarily deployed to nemo-ci November 26, 2025 10:08 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 26, 2025 10:11 — with GitHub Actions Inactive

RayenTian mentioned this pull request Nov 26, 2025

[Refactor] Clear run_grpo_math.py and run_grpo_rm.py #1572

Open

RayenTian removed the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Nov 26, 2025

fix doc

5f8300d

Signed-off-by: ruit <[email protected]>

fix model path

5f17693

Signed-off-by: Yuki Huang <[email protected]>

yuki-97 commented Nov 26, 2025

View reviewed changes

yuki-97 requested review from parthchadha and terrykong November 26, 2025 14:02

joyang-nv approved these changes Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

yuki-97 commented Nov 11, 2025 •

edited by RayenTian

Loading

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

yuki-97 Nov 26, 2025

Uh oh!

RayenTian Nov 27, 2025 •

edited

Loading

Uh oh!

joyang-nv Nov 26, 2025

Uh oh!

RayenTian Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

Are you sure you want to change the base?

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

Conversation

yuki-97 commented Nov 11, 2025 • edited by RayenTian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Result

grpo math before and after refactor

nemotron 49B

Known Issue

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

yuki-97 Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

RayenTian Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joyang-nv Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

RayenTian Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yuki-97 commented Nov 11, 2025 •

edited by RayenTian

Loading

RayenTian Nov 27, 2025 •

edited

Loading