feat: Support for nano-v2 by yfw · Pull Request #1514 · NVIDIA-NeMo/RL

yfw · 2025-11-13T02:42:30Z

What does this PR do ?

Adds support for nvidia/NVIDIA-Nemotron-Nano-9B-v2 and nvidia/NVIDIA-Nemotron-Nano-12B-v2

Issues

List issues that this PR closes (syntax):
closes #1500
closes #1503

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Sample runs:

Summary by CodeRabbit

New Features
- Added YAML configurations for GRPO experiments with Nano-v2-12B models supporting both single-node (1N8G-Megatron) and multi-node (2N8G-FSDP2TP1) distributed training setups.
Bug Fixes
- Enhanced robustness with safety checks for MLP layer access and improved parameter handling for packed sequences.
Tests
- Added comprehensive test scripts for new GRPO configurations and expanded nightly test coverage.

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-13T02:43:13Z

✅ Submodule Fast-Forward Check Results

Check based on commit: bb99fd4 (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-13T02:51:53Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 92c40c1 (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Prevents incorrect dp size in parallel_state during initial import. Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-13T18:07:14Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 2cd85c5 (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-13T18:33:50Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 847d8cc (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

yfw · 2025-11-13T18:38:22Z

Megatron-Bridge change is a bump to this commit which has some required fixes for nano-v2: NVIDIA-NeMo/Megatron-Bridge@8aa287d

coderabbitai · 2025-11-13T18:39:53Z

📝 Walkthrough

Walkthrough

Updates Megatron-Bridge submodule reference, adds two new GRPO experiment configurations for Nano-v2 12B model, implements conditional packed_seq_params handling in Megatron model integration, adds context_parallel_size tracking for model export, includes MoE router safety checks, and introduces corresponding test scripts and nightly test entries.

Changes

Cohort / File(s)	Summary
Submodule Reference `3rdparty/Megatron-Bridge-workspace/Megatron-Bridge`	Updated submodule commit from f003cd8 to 8aa287d
GRPO Configuration Files `examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml`, `examples/configs/recipes/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.yaml`	Added two new YAML configuration files for Nano-v2 12B GRPO experiments with distinct parallelism strategies (single-node Megatron and two-node FSDP2TP1)
Megatron Integration Code `nemo_rl/models/megatron/common.py`, `nemo_rl/models/megatron/community_import.py`	Modified packed_seq_params handling to conditionally pass to model; added context_parallel_size tracking during import/export flow
Policy Worker Updates `nemo_rl/models/policy/megatron_policy_worker.py`	Added safety checks for MoE router access; updated forward path to conditionally pass packed_seq_params via kwargs
Test Scripts `tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh`, `tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh`	Added two new bash test scripts for GRPO experiment validation with metrics checks (loss, token error, reward, timing)
Nightly Test Registry `tests/test_suites/nightly.txt`	Added entries for two new GRPO test suite scripts

Sequence Diagram

sequenceDiagram
    participant Caller
    participant common.py
    participant Model
    participant policy_worker.py

    rect rgb(200, 220, 255)
    Note over Caller,Model: Previous Flow (packed_seq_params always passed)
    Caller->>common.py: pack_sequences(..., packed_seq_params)
    common.py->>Model: model(..., packed_seq_params=packed_seq_params)
    Note over Model: Always receives packed_seq_params<br/>(even if None)
    end

    rect rgb(220, 255, 220)
    Note over Caller,Model: New Flow (packed_seq_params conditionally passed)
    Caller->>common.py: pack_sequences(..., packed_seq_params)
    alt packed_seq_params is not None
        common.py->>common.py: additional_kwargs = {packed_seq_params}
    else packed_seq_params is None
        common.py->>common.py: additional_kwargs = {}
    end
    common.py->>Model: model(..., **additional_kwargs)
    Note over Model: Receives packed_seq_params only if provided
    end

    rect rgb(255, 240, 220)
    Note over policy_worker.py: MoE Router Safety Check
    policy_worker.py->>policy_worker.py: Check layer.mlp.router exists
    alt Router exists
        policy_worker.py->>policy_worker.py: Freeze router
    else Router missing
        policy_worker.py->>policy_worker.py: Skip safely
    end
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Conditional packed_seq_params handling follows a consistent, straightforward pattern applied in two files (common.py and policy_worker.py)
Context_parallel_size tracking involves isolated, self-contained additions to community_import.py
MoE router safety checks use standard defensive programming (attribute existence checks)
Configuration files and test scripts follow established patterns with no complex logic
Changes are well-scoped with minimal cross-file dependencies

Areas for attention:

Verify the conditional kwargs pattern correctly preserves existing behavior when packed_seq_params is None
Confirm context_parallel_size is properly initialized to 0 in the export path for mamba mixer compatibility
Validate MoE router attribute checks cover all required cases and don't mask legitimate errors

Possibly related PRs

cp: fix: Fixes to make Megatron backend match dtensor (1389) into r0.4.0 #1454: Modifies megatron_policy_worker.py forward path and get_logprobs behavior for packed parameter handling
chore: Update RL to use megatron-bridge tot #1358: Updates Megatron integration files for packed parameters, router safety, and parallelism initialization flow
fix: Fixes to make Megatron backend match dtensor #1389: Adjusts megatron_policy_worker.py parallelism and forward kwargs handling alongside tensor-parallel safeguards

Suggested labels

mcore

Suggested reviewers

yaoyu-33
terrykong
zpqiu

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major feature (new Nemotron Nano v2 model support) with code changes but PR description shows all pre-checks unchecked and contains no test results, performance data, or testing evidence.	Execute new test scripts, document results in PR description, verify no regressions against baseline, and mark pre-checks complete before merging.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: adding support for nano-v2 models, which is reflected throughout the changeset including new configs, test scripts, and code adjustments.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yifu/nano-v2-main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_rl/models/policy/megatron_policy_worker.py (1)
1557-1563: Inconsistent packed_seq_params forwarding pattern.

In get_topk_logits, packed_seq_params is passed directly to the model (line 1561), which means it will always be passed even when None. This is inconsistent with the conditional forwarding pattern introduced in get_logprobs (lines 1274-1283) and forward_step_arbitrary_loss in common.py (lines 351-360).

Consider applying the same pattern here for consistency:
+            additional_kwargs = {}
+            if packed_seq_params is not None:
+                additional_kwargs["packed_seq_params"] = packed_seq_params
+
             output_tensor = model(
                 input_ids=input_ids_cp_sharded,
                 position_ids=position_ids,
                 attention_mask=attention_mask,
-                packed_seq_params=packed_seq_params,
                 **multimodal_data,
+                **additional_kwargs,
             )

🧹 Nitpick comments (1)

nemo_rl/models/policy/megatron_policy_worker.py (1)
272-273: Consider more defensive attribute checking.

While the current check prevents AttributeError when mlp or router don't exist, it could still fail if layer.mlp exists but is None. Consider using the more defensive pattern shown later in this file (lines 2374-2375):
-                if hasattr(layer, "mlp") and hasattr(layer.mlp, "router"):
-                    layer.mlp.router.weight.requires_grad = False
+                mlp = getattr(layer, "mlp", None)
+                if mlp is not None and hasattr(mlp, "router"):
+                    mlp.router.weight.requires_grad = False

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 779f775 and 847d8cc.

📒 Files selected for processing (9)

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1 hunks)
examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml (1 hunks)
examples/configs/recipes/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.yaml (1 hunks)
nemo_rl/models/megatron/common.py (1 hunks)
nemo_rl/models/megatron/community_import.py (4 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (2 hunks)
tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh (1 hunks)
tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh (1 hunks)
tests/test_suites/nightly.txt (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2025-09-24T18:36:06.287Z

Learnt from: terrykong
Repo: NVIDIA-NeMo/RL PR: 1024
File: examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp4.yaml:1-1
Timestamp: 2025-09-24T18:36:06.287Z
Learning: In the NVIDIA NeMo RL repository, when working with Hydra config defaults, the scalar string format (defaults: ../../dpo.yaml) is acceptable and preferred over the list format, even though Hydra typically expects defaults to be a list.

Applied to files:

examples/configs/recipes/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.yaml
examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh
tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh
tests/test_suites/nightly.txt

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh
tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

nemo_rl/models/megatron/community_import.py

📚 Learning: 2025-09-19T07:28:29.887Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

tests/test_suites/nightly.txt

🪛 Shellcheck (0.11.0)

tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (10)

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1)

1-1: Ignore this review comment—no submodule change is present in this PR.

Verification reveals that the .gitmodules file is unmodified and no submodule updates are staged. The commit hashes referenced in the review (both old and new) do not exist in the Megatron-Bridge repository. The review was based on incorrect assumptions about a submodule change that does not actually occur in this pull request.

Likely an incorrect or invalid review comment.

nemo_rl/models/megatron/common.py (1)

351-360: LGTM! Clean conditional parameter forwarding.

The pattern of conditionally adding packed_seq_params to additional_kwargs only when it's not None is a good practice. This prevents passing unnecessary None values to the model and provides a clean extension point for future optional parameters.

nemo_rl/models/policy/megatron_policy_worker.py (1)

1274-1283: LGTM! Consistent with common.py changes.

The conditional forwarding of packed_seq_params via additional_kwargs aligns well with the pattern introduced in nemo_rl/models/megatron/common.py (lines 351-360).

nemo_rl/models/megatron/community_import.py (2)

45-45: LGTM! Consistent context_parallel_size tracking.

The tracking and restoration of context_parallel_size follows the same pattern as other parallelism settings (tensor, pipeline, expert) already present in the code. This ensures runtime parallelism settings don't persist in saved checkpoints.

Also applies to: 63-63, 87-87

128-131: Manual seed initialization for mamba mixer.

The addition of model_parallel_cuda_manual_seed(0) in the CPU-distributed export context is noted in the summary as required for mamba mixer. This appears to be a targeted fix for a specific model architecture requirement.

tests/test_suites/nightly.txt (1)

51-53: LGTM! Test coverage for nano-v2 models.

The addition of nightly test entries for the new nano-v2 12B configurations follows the existing structure and provides appropriate test coverage for both Megatron and FSDP2/TP1 variants.

examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml (1)

1-34: LGTM! Well-configured nano-v2 Megatron setup.

The configuration appropriately sets up the NVIDIA Nemotron Nano 12B v2 model with:

Megatron backend with TP=8 for 1-node 8-GPU deployment

Explicit feature toggles (bias_activation_fusion disabled, sequence_packing disabled)

Consistent 512-token limits across generation and data settings

Proper logging and checkpointing configuration

examples/configs/recipes/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.yaml (1)

1-44: LGTM! Memory-optimized FSDP2 configuration.

The 2-node FSDP2/TP1 configuration includes appropriate memory optimizations:

CPU offload and activation checkpointing enabled

Dynamic batching for efficient resource utilization

Multi-stage scheduler with 13-step linear warmup

The configuration complements the Megatron variant and provides an alternative deployment strategy for the nano-v2 12B model.

tests/test_suites/llm/grpo-nano-v2-12b-1n8g-megatron.sh (1)

1-41: LGTM! Test script follows project conventions.

The test script properly:

Configures a 30-step GRPO experiment with comprehensive logging (WandB, TensorBoard)

Converts TensorBoard logs to JSON for automated validation

Applies appropriate metrics thresholds for train loss, token error, reward, and timing

The script follows established patterns in the repository, including variable naming and argument passing conventions.

Based on learnings

tests/test_suites/llm/grpo-nano-v2-12b-2n8g-fsdp2tp1.sh (1)

1-41: LGTM! 2-node test variant properly configured.

The 2-node FSDP2/TP1 test script mirrors the 1-node Megatron variant with appropriate adjustments:

NUM_NODES set to 2 for multi-node testing

Stricter timing threshold (60s vs 80s), reflecting expected performance improvement with additional nodes

Same validation metrics for consistency

Based on learnings

Add packed_seq_params change to get_topk_logits too Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-13T18:55:38Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 608cf6d (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

terrykong

small comment. otherwise lgtm. do you mind including the convergence wandb/plots

nemo_rl/models/megatron/common.py

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions · 2025-11-14T20:57:09Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 8656e81 (PR #1514 from yifu/nano-v2-main)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

yfw · 2025-11-14T20:57:51Z

small comment. otherwise lgtm. do you mind including the convergence wandb/plots

Added plots of the nightly tests in the top PR comment

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yfw added 2 commits November 12, 2025 18:34

nano-v2 support

9462df9

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Set model parallel cuda manual seed for mamba mixer

bb99fd4

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Add nano-v2 tests to nightlies

92c40c1

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw changed the title ~~Support for nano-v2~~ feat: Support for nano-v2 Nov 13, 2025

yfw added 2 commits November 13, 2025 09:58

Set context_parallel_size when importing model

cc70215

Prevents incorrect dp size in parallel_state during initial import. Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

lint

2cd85c5

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Fix typo

847d8cc

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added the CI:L1 Run doctests, unit tests, and functional tests label Nov 13, 2025

yfw temporarily deployed to nemo-ci November 13, 2025 18:33 — with GitHub Actions Inactive

yfw marked this pull request as ready for review November 13, 2025 18:35

yfw requested review from a team as code owners November 13, 2025 18:35

yfw requested review from terrykong and yaoyu-33 November 13, 2025 18:36

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Address coderabbit comment

608cf6d

Add packed_seq_params change to get_topk_logits too Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 13, 2025

yfw temporarily deployed to nemo-ci November 13, 2025 19:13 — with GitHub Actions Inactive

yfw temporarily deployed to nemo-ci November 13, 2025 19:21 — with GitHub Actions Inactive

yfw temporarily deployed to nemo-ci November 13, 2025 23:28 — with GitHub Actions Inactive

terrykong reviewed Nov 14, 2025

View reviewed changes

nemo_rl/models/megatron/common.py Show resolved Hide resolved

Add comments about mamba sequence packing

8656e81

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 14, 2025

terrykong enabled auto-merge (squash) November 14, 2025 21:02

terrykong temporarily deployed to nemo-ci November 14, 2025 21:02 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci November 14, 2025 21:03 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci November 14, 2025 23:24 — with GitHub Actions Inactive

terrykong approved these changes Nov 18, 2025

View reviewed changes

terrykong merged commit c32778d into main Nov 18, 2025
41 of 43 checks passed

terrykong deleted the yifu/nano-v2-main branch November 18, 2025 05:57

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

feat: Support for nano-v2 (NVIDIA-NeMo#1514)

e534aed

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

coderabbitai bot mentioned this pull request Dec 15, 2025

feat: Necessary changes for Gym GRPO tutorial #1630

Merged

4 tasks

coderabbitai bot mentioned this pull request Dec 29, 2025

Add FP8 support to nano-v3 branch #1704

Open

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

feat: Support for nano-v2 (NVIDIA-NeMo#1514)

df61ce2

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

coderabbitai bot mentioned this pull request Jan 13, 2026

docs: V0.5 perf results #1771

Closed

4 tasks

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

feat: Support for nano-v2 (NVIDIA-NeMo#1514)

0a8915a

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Conversation

yfw commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 13, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 13, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 13, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 13, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

yfw commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 13, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

yfw commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yfw commented Nov 13, 2025 •

edited

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading