feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) by RayenTian · Pull Request #1866 · NVIDIA-NeMo/RL

RayenTian · 2026-02-03T02:41:27Z

What does this PR do ?

❗Load Lora ckpt failed:
- Root cause: the forward of NemotronHMamba2Mixer module may use cuda_kernels_forward, and this will lead to out_proj lora has no gradient. This can also explain why the accuracy curve of lore and w/o lora have a slight mismatch.
- Solution: exclude *out_proj* modules in lora layers.
- More tests are undergoing.
❗ Use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 as tokenizer and nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 as model.
- Because base model doesn't provide default chat template. And if we use chat template of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, it's too long to write in yaml directly, then an additional file needs to be added to store the chat template.

Convergence Test

Different configs have different accuracy, choose dim=128, alpha=512 as the recipe config.

SFT

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Added LoRA configuration guidance for GRPO training with DTensor backend support and inference strategies.
New Features
- Added GRPO configuration examples for NVIDIA Nemotron-3 Nano-30B models with and without LoRA.
Tests
- Added test suite entries for new GRPO configurations with corresponding automated validation scripts.

coderabbitai · 2026-02-03T02:46:13Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive GRPO (Generative Reward Process Optimization) support for the Nanov3 model, including documentation of DTensor LoRA configuration, two YAML recipe variants (with and without LoRA), corresponding test scripts with metric validation, and test suite registration.

Changes

Cohort / File(s)	Summary
Documentation `docs/guides/grpo.md`	Added LoRA Configuration section describing DTensor backend setup, merge-weight approach for generation, training-inference parity considerations, and reference to example YAML recipe.
Configuration Recipes `examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2.yaml`, `examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml`	Two YAML configuration files for GRPO Nanov3 experiments. LoRA variant includes DTensor LoRA configuration (dim 128, alpha 256); both define model, batch sizes, sequence length, generation parameters (tensor_parallel_size 4, gpu_memory_utilization 0.7), logging, and cluster allocation (8 GPUs per node, 2 nodes).
Test Scripts `tests/test_suites/llm/grpo-nanov3-30BA3B-2n8g-fsdp2.sh`, `tests/test_suites/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.sh`	New test scripts that execute GRPO experiments, convert TensorBoard logs to JSON, and validate metrics including training loss, reward values, and timing thresholds. Scripts include conditional cleanup of checkpoint directories on success.
Test Suite Registration `tests/test_suites/nightly.txt`	Added two nightly test entries for Nanov3 LoRA and standard GRPO configurations under the Nano-v3 heading.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

cp: feat: Megatron SFT LoRA (1629) into r0.5.0 #1741: Adds DTensor LoRA configuration support referenced in this PR's documentation and recipes.
chore: add nanov3 lora sft recipe to doc #1860: Documents Nanov3 LoRA SFT recipe referenced alongside GRPO LoRA configuration details.

Suggested labels

documentation

Suggested reviewers

yuki-97
terrykong

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major changes (new model configs, LoRA support, nightly tests) affecting numerics/convergence but PR description lacks test results, baseline metrics, performance numbers, and regression analysis.	Update PR description with baseline metrics, test results, convergence validation, and justification of metric thresholds for the Nano v3 GRPO configuration.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately summarizes the main changes: adding GRPO nightly tests for Nemotron-3 Nano 30B A3B with FSDP2 and LoRA support, which aligns with all file additions (test configs, test scripts, documentation, and nightly test registry).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/nanov3_grpo_recipe_dtensor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

terrykong

awesome!

can you:

retroactively add lora to the news section in the front page readme? some users did not know we had this supported. maybe add an item that lora sft is supported in both backends, and dtensor grpo supported and to see our nano v3 recipe (and mention mcore coming soon)
do you think we'd see more interesting reward curves if we did base for model_name but the reasoning model for the tokenizer_name?

RayenTian · 2026-02-04T12:05:08Z

awesome!

can you:

retroactively add lora to the news section in the front page readme? some users did not know we had this supported. maybe add an item that lora sft is supported in both backends, and dtensor grpo supported and to see our nano v3 recipe (and mention mcore coming soon)

do you think we'd see more interesting reward curves if we did base for model_name but the reasoning model for the tokenizer_name?

Done for both 1 and 2. Attatched new running curve to PR doc.

Signed-off-by: ruit <ruit@nvidia.com>

…just linear matching settings Signed-off-by: ruit <ruit@nvidia.com>

…tput projection modules for clarity Signed-off-by: ruit <ruit@nvidia.com>

…VIDIA-NeMo#1866) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

…1866) Signed-off-by: ruit <ruit@nvidia.com>

RayenTian requested review from a team as code owners February 3, 2026 02:41

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 3, 2026

github-actions bot added the documentation Improvements or additions to documentation label Feb 3, 2026

RayenTian temporarily deployed to nemo-ci February 3, 2026 02:41 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 3, 2026 02:44 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

RayenTian had a problem deploying to nemo-ci February 3, 2026 09:08 — with GitHub Actions Error

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

RayenTian temporarily deployed to nemo-ci February 4, 2026 02:27 — with GitHub Actions Inactive

terrykong reviewed Feb 4, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci February 4, 2026 08:54 — with GitHub Actions Inactive

RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from d726426 to 9cc9c08 Compare February 4, 2026 12:04

RayenTian requested a review from a team as a code owner February 4, 2026 12:04

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 4, 2026

RayenTian temporarily deployed to nemo-ci February 4, 2026 12:08 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 4, 2026 15:17 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci February 5, 2026 04:17 — with GitHub Actions Error

RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from 58c7f91 to 8c5dba6 Compare February 5, 2026 08:33

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026

RayenTian temporarily deployed to nemo-ci February 5, 2026 08:35 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 5, 2026 09:23 — with GitHub Actions Inactive

RayenTian requested a review from a team as a code owner February 6, 2026 12:45

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 6, 2026

RayenTian had a problem deploying to nemo-ci February 6, 2026 13:30 — with GitHub Actions Error

RayenTian added 8 commits February 6, 2026 05:45

add nanov3 recipe

2d4c9d0

Signed-off-by: ruit <ruit@nvidia.com>

update config metrics

56bce90

Signed-off-by: ruit <ruit@nvidia.com>

add grpo lora doc

9fe14b4

Signed-off-by: ruit <ruit@nvidia.com>

update total nightly test time

14c712e

Signed-off-by: ruit <ruit@nvidia.com>

add news to readme, update tokenizer config

e076b2a

Signed-off-by: ruit <ruit@nvidia.com>

update lora alpha config

1f782c4

Signed-off-by: ruit <ruit@nvidia.com>

update lora configuration to exclude output projection modules and ad…

ac7d738

…just linear matching settings Signed-off-by: ruit <ruit@nvidia.com>

update lora configuration to include detailed comments on excluded ou…

05541c5

…tput projection modules for clarity Signed-off-by: ruit <ruit@nvidia.com>

RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from aea477e to 05541c5 Compare February 6, 2026 13:48

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 6, 2026

RayenTian temporarily deployed to nemo-ci February 6, 2026 13:49 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 6, 2026 15:04 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) February 6, 2026 17:49

terrykong approved these changes Feb 6, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci February 6, 2026 22:19 — with GitHub Actions Inactive

terrykong merged commit 3eedbc6 into main Feb 7, 2026
44 of 45 checks passed

terrykong deleted the ruit/nanov3_grpo_recipe_dtensor branch February 7, 2026 01:44

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) (#…

e593b6b

…1866) Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) (#…

40a3d1b

…1866) Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) (#…

b0e4259

…1866) Signed-off-by: ruit <ruit@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) #1866

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) #1866
terrykong merged 8 commits intomainfrom
ruit/nanov3_grpo_recipe_dtensor

RayenTian commented Feb 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

terrykong left a comment

Uh oh!

RayenTian commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RayenTian commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Convergence Test

SFT

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

RayenTian commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RayenTian commented Feb 3, 2026 •

edited

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading