Skip to content

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) #1866

Merged
terrykong merged 8 commits intomainfrom
ruit/nanov3_grpo_recipe_dtensor
Feb 7, 2026
Merged

feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) #1866
terrykong merged 8 commits intomainfrom
ruit/nanov3_grpo_recipe_dtensor

Conversation

@RayenTian
Copy link
Contributor

@RayenTian RayenTian commented Feb 3, 2026

What does this PR do ?

  • ❗Load Lora ckpt failed:
    • Root cause: the forward of NemotronHMamba2Mixer module may use cuda_kernels_forward, and this will lead to out_proj lora has no gradient. This can also explain why the accuracy curve of lore and w/o lora have a slight mismatch.
    • Solution: exclude *out_proj* modules in lora layers.
    • More tests are undergoing.
  • ❗ Use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 as tokenizer and nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 as model.
    • Because base model doesn't provide default chat template. And if we use chat template of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, it's too long to write in yaml directly, then an additional file needs to be added to store the chat template.

Convergence Test

Different configs have different accuracy, choose dim=128, alpha=512 as the recipe config.
image

SFT

image

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Documentation

    • Added LoRA configuration guidance for GRPO training with DTensor backend support and inference strategies.
  • New Features

    • Added GRPO configuration examples for NVIDIA Nemotron-3 Nano-30B models with and without LoRA.
  • Tests

    • Added test suite entries for new GRPO configurations with corresponding automated validation scripts.

@RayenTian RayenTian requested review from a team as code owners February 3, 2026 02:41
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 3, 2026
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 3, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive GRPO (Generative Reward Process Optimization) support for the Nanov3 model, including documentation of DTensor LoRA configuration, two YAML recipe variants (with and without LoRA), corresponding test scripts with metric validation, and test suite registration.

Changes

Cohort / File(s) Summary
Documentation
docs/guides/grpo.md
Added LoRA Configuration section describing DTensor backend setup, merge-weight approach for generation, training-inference parity considerations, and reference to example YAML recipe.
Configuration Recipes
examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2.yaml, examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
Two YAML configuration files for GRPO Nanov3 experiments. LoRA variant includes DTensor LoRA configuration (dim 128, alpha 256); both define model, batch sizes, sequence length, generation parameters (tensor_parallel_size 4, gpu_memory_utilization 0.7), logging, and cluster allocation (8 GPUs per node, 2 nodes).
Test Scripts
tests/test_suites/llm/grpo-nanov3-30BA3B-2n8g-fsdp2.sh, tests/test_suites/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.sh
New test scripts that execute GRPO experiments, convert TensorBoard logs to JSON, and validate metrics including training loss, reward values, and timing thresholds. Scripts include conditional cleanup of checkpoint directories on success.
Test Suite Registration
tests/test_suites/nightly.txt
Added two nightly test entries for Nanov3 LoRA and standard GRPO configurations under the Nano-v3 heading.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

documentation

Suggested reviewers

  • yuki-97
  • terrykong
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major changes (new model configs, LoRA support, nightly tests) affecting numerics/convergence but PR description lacks test results, baseline metrics, performance numbers, and regression analysis. Update PR description with baseline metrics, test results, convergence validation, and justification of metric thresholds for the Nano v3 GRPO configuration.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately summarizes the main changes: adding GRPO nightly tests for Nemotron-3 Nano 30B A3B with FSDP2 and LoRA support, which aligns with all file additions (test configs, test scripts, documentation, and nightly test registry).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ruit/nanov3_grpo_recipe_dtensor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

can you:

  1. retroactively add lora to the news section in the front page readme? some users did not know we had this supported. maybe add an item that lora sft is supported in both backends, and dtensor grpo supported and to see our nano v3 recipe (and mention mcore coming soon)
  2. do you think we'd see more interesting reward curves if we did base for model_name but the reasoning model for the tokenizer_name?

@RayenTian RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from d726426 to 9cc9c08 Compare February 4, 2026 12:04
@RayenTian RayenTian requested a review from a team as a code owner February 4, 2026 12:04
@RayenTian
Copy link
Contributor Author

awesome!

can you:

  1. retroactively add lora to the news section in the front page readme? some users did not know we had this supported. maybe add an item that lora sft is supported in both backends, and dtensor grpo supported and to see our nano v3 recipe (and mention mcore coming soon)
  2. do you think we'd see more interesting reward curves if we did base for model_name but the reasoning model for the tokenizer_name?

Done for both 1 and 2. Attatched new running curve to PR doc.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 4, 2026
@RayenTian RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from 58c7f91 to 8c5dba6 Compare February 5, 2026 08:33
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026
@RayenTian RayenTian requested a review from a team as a code owner February 6, 2026 12:45
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 6, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…just linear matching settings

Signed-off-by: ruit <ruit@nvidia.com>
…tput projection modules for clarity

Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian force-pushed the ruit/nanov3_grpo_recipe_dtensor branch from aea477e to 05541c5 Compare February 6, 2026 13:48
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 6, 2026
@terrykong terrykong enabled auto-merge (squash) February 6, 2026 17:49
@terrykong terrykong merged commit 3eedbc6 into main Feb 7, 2026
44 of 45 checks passed
@terrykong terrykong deleted the ruit/nanov3_grpo_recipe_dtensor branch February 7, 2026 01:44
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
…VIDIA-NeMo#1866)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…VIDIA-NeMo#1866)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants