Skip to content

feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA)#1648

Merged
terrykong merged 10 commits intomainfrom
ruit/nano_v3_recipe
Dec 24, 2025
Merged

feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA)#1648
terrykong merged 10 commits intomainfrom
ruit/nano_v3_recipe

Conversation

@RayenTian
Copy link
Contributor

@RayenTian RayenTian commented Dec 17, 2025

Summary:

Introduces nightly coverage for SFT on the Nemotron‑3 Nano 30B A3B BF16 model, including both a base FSDP2 configuration and a LoRA-enabled variant. Adds runnable test scripts with metric thresholds and registers them in the nightly test suite.

Changes:

New configs:
examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml

New nightly test scripts:
tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh

Nightly registration:
Appends the two new scripts to tests/test_suites/nightly.txt under “Nemotron 3 Nano 30B A3B BF16 tests”.

Results

When we set checkpoint period as 10, the configuration of sft-nanov3-30BA3B-2n8g-fsdp2 will result in a significant portion of additional memory usage, leading to a notable slowdown after 10 steps. The speed returns to normal after disabling the checkpoint.

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16

AdamW Optimizer

  • lora dim: 256
  • lora alpha: 512
image

Adam optimizer

image

Enable CKPT

image image image image image image

memory

image

Disable CKPT

image image

memory

image

Known Issue

#1688

@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 18, 2025
@RayenTian RayenTian requested a review from joyang-nv December 18, 2025 03:06
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 22, 2025
@RayenTian
Copy link
Contributor Author

@ZhiyuLi-Nvidia @hemildesai @samodi-nv Do you have any ideas why lora is slower than normal SFT?

@ZhiyuLi-Nvidia
Copy link
Contributor

@ZhiyuLi-Nvidia @hemildesai @samodi-nv Do you have any ideas why lora is slower than normal SFT?

lora should be computational efficient unless the training pipeline is bounded by other overheads.

Could you try the following to see if lora is faster?

  • using 1 node instead (to minimize cross node communication)?
  • Maximize batch size (to best utilize computation relatviely)

@terrykong terrykong removed the r0.5.0 label Dec 22, 2025
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 23, 2025
@RayenTian RayenTian marked this pull request as ready for review December 23, 2025 03:25
@RayenTian RayenTian requested review from a team as code owners December 23, 2025 03:25
@RayenTian RayenTian requested a review from terrykong December 23, 2025 03:26
@RayenTian RayenTian requested a review from yuki-97 December 23, 2025 03:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Walkthrough

Adds two new SFT configuration files and corresponding test scripts for the NVIDIA Nemotron Nano 30B model on 2-node, 8-GPU-per-node clusters, with and without LoRA. Updates the nightly test suite manifest to include both variants.

Changes

Cohort / File(s) Summary
SFT Configuration Files
examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml, examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
New configurations for Nemotron Nano 30B SFT experiments. Base config includes Adam optimizer, global batch size 16, max sequence length 2048, and logging for wandb/tensorboard/mlflow. LoRA variant adds dtensor_cfg with enabled LoRA (use_triton: false). Both disable checkpointing and set max_num_steps: 100.
Test Scripts
tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh, tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
New bash test scripts that configure and execute SFT experiments (NUM_NODES=2, STEPS_PER_RUN calculation), convert TensorBoard logs to JSON, and conditionally run metrics checks enforcing train/loss < 4.20 at step 20 and step timing < 15 seconds.
Test Suite Manifest
tests/test_suites/nightly.txt
Adds both new test scripts to nightly test suite. Entries appear in two separate sections, resulting in duplicate insertions of the same test references.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

Run CICD

Suggested reviewers

  • joyang-nv
  • yfw

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding Nemotron-3 Nano 30B A3B BF16 SFT nightly tests with FSDP2 and LoRA variants, which aligns directly with the changeset content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR adds test infrastructure and configuration files for existing model variant, which are minor changes. Performance and memory testing results were documented with screenshots, and metric thresholds were established based on empirical testing.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ruit/nano_v3_recipe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56e8fcb and 8e5aaaf.

📒 Files selected for processing (5)
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/nightly.txt
🧰 Additional context used
📓 Path-based instructions (7)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • tests/test_suites/nightly.txt
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
tests/test_suites/**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes

Applied to files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
📚 Learning: 2025-09-24T18:36:06.287Z
Learnt from: terrykong
Repo: NVIDIA-NeMo/RL PR: 1024
File: examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp4.yaml:1-1
Timestamp: 2025-09-24T18:36:06.287Z
Learning: In the NVIDIA NeMo RL repository, when working with Hydra config defaults, the scalar string format (defaults: ../../dpo.yaml) is acceptable and preferred over the list format, even though Hydra typically expects defaults to be a list.

Applied to files:

  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
  • examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/nightly.txt
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain

Applied to files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh
  • tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Docs_Tests
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (5)
tests/test_suites/nightly.txt (1)

90-93: LGTM!

The nightly test entries are correctly added under the SFT section with an appropriate comment header. The paths follow the expected pattern relative to tests/test_suites/. Based on learnings, this correctly appends the driver script paths to nightly.txt.

tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2.sh (1)

1-39: LGTM!

The script follows the established test infrastructure patterns:

  • Standard configuration variables (NUM_NODES, STEPS_PER_RUN, etc.) are consumed by external launch tooling
  • Uses uv run per coding guidelines
  • Matches the YAML base name with .sh extension
  • Metric thresholds are defined appropriately

Based on learnings, the cd $PROJECT_ROOT without error handling and unquoted $@ are consistent with this repository's conventions.

examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml (1)

1-15: Configuration structure looks correct.

The defaults scalar format is acceptable per learnings. LoRA configuration (lora_cfg.enabled: true) is properly nested under dtensor_cfg. Cluster settings match the filename pattern (2n8g).

Also applies to: 24-26

examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2.yaml (1)

1-22: LGTM!

The base SFT configuration is well-structured:

  • Uses scalar defaults format (acceptable per learnings)
  • Logger names correctly match the filename
  • Cluster settings (2n8g) align with the filename pattern
  • No LoRA settings as expected for the base variant
tests/test_suites/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.sh (1)

1-39: LGTM!

The LoRA test script follows the same established patterns as the base script. The higher loss threshold (4.20 vs 3.20) appropriately accounts for potentially different convergence characteristics with LoRA.

Note: The comment on line 7 states step_time ~ 8sec while the base config says ~15sec. Given the PR discussion about LoRA being slower than expected, you may want to verify this comment reflects actual observed timing after your testing.

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 24, 2025
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 24, 2025
@terrykong terrykong enabled auto-merge (squash) December 24, 2025 05:41
@terrykong terrykong merged commit 433eaa1 into main Dec 24, 2025
41 of 42 checks passed
@terrykong terrykong deleted the ruit/nano_v3_recipe branch December 24, 2025 06:55
chtruong814 pushed a commit that referenced this pull request Dec 24, 2025
…A) (#1648)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
parthmannan pushed a commit to parthmannan/RL that referenced this pull request Jan 15, 2026
…A) (NVIDIA-NeMo#1648)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
…A) (NVIDIA-NeMo#1648)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…A) (NVIDIA-NeMo#1648)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants