Skip to content

Conversation

@yuki-97
Copy link
Contributor

@yuki-97 yuki-97 commented Nov 11, 2025

Follow up of #1472. Thanks @nv-mmanohara for adding this!

  1. Add GRPO support for HelpSteer3 on LlamaNemotron 49B.
  2. Add SFT support for tulu3 on LlamaNemotron 49B.
  3. Add CodeJaccard environment.
  4. Refactor env and data processor.
  5. Introduce run_grpo.py, will [Refactor] Clear run_grpo_math.py and run_grpo_rm.py #1572 in a subsequent PR.

Test Result

grpo math before and after refactor

image

nemotron 49B

image

Known Issue

  1. nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face: [BUG] nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face #1571
  2. GRPO Nemotron HelpSteer3 recipe has very high logprob error: [BUG] GRPO Nemotron HelpSteer3 recipe has very high logprob error #1570

Summary by CodeRabbit

  • New Features

    • Added HelpSteer3 and Tulu3 datasets for training and evaluation.
    • Introduced CodeJaccard environment for code-based similarity scoring.
    • Enhanced data processor registry system with improved task flexibility.
    • Added new GRPO training script and SFT configurations for Nemotron-49B models.
    • Support for dynamic task naming across datasets.
  • Documentation

    • Updated custom parallel plan paths.
  • Tests

    • Added new GRPO HelpSteer3 and SFT test suites.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 11, 2025
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025
@yuki-97 yuki-97 force-pushed the yukih/pr-1472 branch 2 times, most recently from c9335d4 to a872ed6 Compare November 11, 2025 09:27
@yuki-97 yuki-97 removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025
@RayenTian RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025
@RayenTian RayenTian force-pushed the yukih/pr-1472 branch 2 times, most recently from b7fedb9 to 9078e33 Compare November 16, 2025 03:37
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 16, 2025
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025
… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files.

Signed-off-by: ruit <[email protected]>
…or handling

    - Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration.
    - Updated RawDataset class to provide a default processor if none is specified in the data configuration.
    - Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output.

    Signed-off-by: ruit <[email protected]>

Signed-off-by: ruit <[email protected]>
- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling.
- Added necessary imports for RawDataset in the respective files.

Signed-off-by: ruit <[email protected]>
…up for vlm grpo

- Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification.
- Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing.

Signed-off-by: ruit <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 7e1566c (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

  • Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
  • Update the appropriate related file(s) if necessary to maintain functional consistency
  • Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
  • Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/dtensor/parallelize.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 3fb0ec3 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

  • Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
  • Update the appropriate related file(s) if necessary to maintain functional consistency
  • Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
  • Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/dtensor/parallelize.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 26, 2025
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: a24ba76 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

  • Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
  • Update the appropriate related file(s) if necessary to maintain functional consistency
  • Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
  • Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/dtensor/parallelize.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 26, 2025
@RayenTian RayenTian removed the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Nov 26, 2025
Signed-off-by: ruit <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 5f8300d (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

  • Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
  • Update the appropriate related file(s) if necessary to maintain functional consistency
  • Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
  • Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/dtensor/parallelize.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: Yuki Huang <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 5f17693 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

  • Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
  • Update the appropriate related file(s) if necessary to maintain functional consistency
  • Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
  • Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/dtensor/parallelize.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
  • Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Comment on lines +20 to +25
def __init__(self, data_config: dict, seed: int = 42):
self.data_config: dict = data_config
self.seed: int = seed
self.processor: TaskDataProcessFnCallable | None = None
self.task_spec: TaskDataSpec | None = None
raise NotImplementedError("__init__ is not implemented")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using something like this then do super().__init__(...) in the sub-class?

then self.data_config = data_config at line39, set_processor and set_task_spec in nemo_rl/data/datasets/response_datasets/__init__.py can be removed.

Suggested change
def __init__(self, data_config: dict, seed: int = 42):
self.data_config: dict = data_config
self.seed: int = seed
self.processor: TaskDataProcessFnCallable | None = None
self.task_spec: TaskDataSpec | None = None
raise NotImplementedError("__init__ is not implemented")
def __init__(self, data_config: dict, seed: int = 42):
self.data_config: dict = data_config
self.seed: int = seed
self.processor: TaskDataProcessFnCallable | None = self.get_processor()
self.task_spec: TaskDataSpec | None = self.get_task_spec()

Copy link
Contributor

@RayenTian RayenTian Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use super().__init__(...), we need pass data_config when initialize dataset, then all the unit test and other place using dataset need to be modified. It will lead to a huger PR. This part will be done in future PR for ISSUE #1552

from nemo_rl.utils.logger import get_next_experiment_dir

OmegaConf.register_new_resolver("mul", lambda a, b: a * b)
OmegaConf.register_new_resolver("max", lambda a, b: max(a, b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1, we should move customized op into a common place in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I can move it to a new place in the following PR to resolve #1552 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants