Update Nemotron 3 Nano perf configs by malay-nagda · Pull Request #2510 · NVIDIA-NeMo/Megatron-Bridge

malay-nagda · 2026-02-24T12:09:10Z

What does this PR do ?

Updates Nemotron 3 Nano perf configs.

Changelog

BASE_NEMOTRON_3_NANO_CONFIG = WorkloadBaseConfig(
    num_gpus=8,
    global_batch_size=512,
    micro_batch_size=2,
    tensor_model_parallel_size=1,
    expert_tensor_parallel_size=1,
    expert_model_parallel_size=8,
    moe_flex_dispatcher_backend="hybridep",
)

_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 = replace(
    BASE_NEMOTRON_3_NANO_CONFIG,
    num_gpus=16,
    global_batch_size=1024,
    micro_batch_size=1,
    recompute_modules=["moe", "layernorm"],
)

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Chores
- Updated Nemotron 3 Nano model configurations across GB300, GB200, B300, B200, and H100 variants with optimized batch sizes and parallelization settings.
- Added support for MOE (Mixture of Experts) flex dispatcher backend with "hybridep" mode.
- Streamlined configuration inheritance to reduce redundant variant definitions.

Signed-off-by: Malay Nagda <malayn@nvidia.com>

copy-pr-bot · 2026-02-24T12:09:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Malay Nagda <malayn@nvidia.com>

coderabbitai · 2026-02-25T07:11:41Z

📝 Walkthrough

Walkthrough

The changes modify Nemotron 3 Nano configuration setup by adding a mock parameter to pretrain config factory functions with conditional propagation of moe_flex_dispatcher_backend from base config, and refactoring workload base configs to consolidate pretrain variants as direct references to the base config while introducing a specialized H100 configuration variant.

Changes

Cohort / File(s)	Summary
Pretrain Config Factory Functions `scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py`	Added `mock: bool = True` parameter to five factory functions (`nemotron_3_nano_pretrain_config_gb300`, `nemotron_3_nano_pretrain_config_gb200`, `nemotron_3_nano_pretrain_config_b300`, `nemotron_3_nano_pretrain_config_b200`, `nemotron_3_nano_pretrain_config_h100`). Each function now conditionally propagates `moe_flex_dispatcher_backend` from base config to model config when the field is not None.
Workload Base Configuration `scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`	Updated global parameters (`global_batch_size: 3072→512`, `tensor_model_parallel_size: 4→1`, added `moe_flex_dispatcher_backend="hybridep"`). Restructured multiple pretrain config variants (GB300, GB200, B300, B200) to reference `BASE_NEMOTRON_3_NANO_CONFIG` directly instead of using `replace()`. Introduced internal H100 helper config with specialized parameters (`num_gpus=16`, `global_batch_size=1024`, `micro_batch_size=1`, `recompute_modules=["moe","layernorm"]`) with two public aliases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

nemotron3 nano recipes #2301: Introduces moe_flex_dispatcher_backend field in nemotron_3_nano workload base configs that is subsequently propagated by this PR's factory function changes.

Suggested reviewers

erhoo82
ko3n1g
tomlifu

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains significant Nemotron 3 Nano configuration changes affecting performance and training convergence without documented test results or validation metrics.	Add comprehensive testing documentation including before-and-after performance comparisons, convergence validation results, testing environment details, and benchmark results justifying configuration changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Update Nemotron 3 Nano perf configs' clearly summarizes the main change: updating performance configuration files for Nemotron 3 Nano models across multiple config variants.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch malay/mock_arg_remove

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py (1)

54-62: Use a tuple (or immutable sequence) for recompute_modules to guard against accidental mutation.

The ["moe", "layernorm"] list literal is stored inside the dataclass and shared by both NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1. Even if the dataclass is frozen (preventing field reassignment), the list object itself remains mutable; an in-place .append() / .remove() call elsewhere would corrupt both aliases at once.
♻️ Proposed fix
 _NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 = replace(
     BASE_NEMOTRON_3_NANO_CONFIG,
     num_gpus=16,
     global_batch_size=1024,
     micro_batch_size=1,
-    recompute_modules=["moe", "layernorm"],
+    recompute_modules=("moe", "layernorm"),
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`
around lines 54 - 62, The recompute_modules argument in the
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 replace call is using a mutable list
literal ["moe", "layernorm"] that will be shared across the derived configs;
change it to an immutable tuple ("moe", "layernorm") (or another immutable
sequence) so the value stored in the dataclass cannot be mutated in-place;
update the recompute_modules parameter in the replace call for
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 accordingly so both
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1 inherit the immutable sequence.

scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py (1)

52-53: Extract the repeated moe_flex_dispatcher_backend propagation into a shared helper.

The identical two-line conditional is copy-pasted verbatim into all five functions. Folding it into set_nemotron_3_nano_common_configs (by also accepting base_cfg) or a separate helper eliminates the duplication and ensures any future changes are applied consistently.

♻️ Proposed refactor

-def set_nemotron_3_nano_common_configs(cfg: ConfigContainer) -> None:
-    """Set common performance configurations for all Nemotron 3 Nano configs."""
+def set_nemotron_3_nano_common_configs(cfg: ConfigContainer, base_cfg=None) -> None:
+    """Set common performance configurations for all Nemotron 3 Nano configs."""
     cfg.mixed_precision.grad_reduce_in_fp32 = False
     cfg.ddp.grad_reduce_in_fp32 = False
+    if base_cfg is not None and base_cfg.moe_flex_dispatcher_backend is not None:
+        cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend

Then, in each function replace:

     set_nemotron_3_nano_common_configs(cfg)
     set_workload_base_configs(cfg, base_cfg)
-    if base_cfg.moe_flex_dispatcher_backend is not None:
-        cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend
+    set_nemotron_3_nano_common_configs(cfg, base_cfg)

Also applies to: 76-77, 100-101, 124-125, 148-149

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py` around
lines 52 - 53, The two-line conditional that propagates
base_cfg.moe_flex_dispatcher_backend into cfg.model.moe_flex_dispatcher_backend
is duplicated across five functions; extract it into a single helper (e.g., add
an argument base_cfg to set_nemotron_3_nano_common_configs or create a small
function like propagate_moe_flex_dispatcher_backend(base_cfg, cfg)) and call
that helper from each of the five places instead of repeating the conditional;
ensure the helper checks base_cfg.moe_flex_dispatcher_backend is not None and
assigns to cfg.model.moe_flex_dispatcher_backend, and update the five callers to
invoke the helper (or to pass base_cfg into set_nemotron_3_nano_common_configs)
so future changes are centralized.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py`:
- Around line 34-36: Remove the unused mock parameter from the five pretrain
config factory signatures: e.g., remove "mock: bool = True" from
nemotron_3_nano_pretrain_config_gb300 and the four other
nemotron_3_nano_*_pretrain_config_* functions in this file; update each function
signature to drop the mock parameter and ensure no internal references remain
(ruff ARG001 will be resolved). Also update any local references or tests that
call these factories with the mock positional/kwarg so callers pass only the
remaining parameters (precision and config_variant).

---

Nitpick comments:
In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py`:
- Around line 52-53: The two-line conditional that propagates
base_cfg.moe_flex_dispatcher_backend into cfg.model.moe_flex_dispatcher_backend
is duplicated across five functions; extract it into a single helper (e.g., add
an argument base_cfg to set_nemotron_3_nano_common_configs or create a small
function like propagate_moe_flex_dispatcher_backend(base_cfg, cfg)) and call
that helper from each of the five places instead of repeating the conditional;
ensure the helper checks base_cfg.moe_flex_dispatcher_backend is not None and
assigns to cfg.model.moe_flex_dispatcher_backend, and update the five callers to
invoke the helper (or to pass base_cfg into set_nemotron_3_nano_common_configs)
so future changes are centralized.

In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Around line 54-62: The recompute_modules argument in the
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 replace call is using a mutable list
literal ["moe", "layernorm"] that will be shared across the derived configs;
change it to an immutable tuple ("moe", "layernorm") (or another immutable
sequence) so the value stored in the dataclass cannot be mutated in-place;
update the recompute_modules parameter in the replace call for
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 accordingly so both
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1 inherit the immutable sequence.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8896cd and 4bdf493.

📒 Files selected for processing (2)

scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py

coderabbitai · 2026-02-25T07:11:44Z

scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py

+def nemotron_3_nano_pretrain_config_gb300(
+    precision: str = "bf16", mock: bool = True, config_variant: str = "v1"
+) -> ConfigContainer:


⚠️ Potential issue | 🟠 Major

Remove the unused mock parameter from all five function signatures.

The mock: bool = True argument is added to every pretrain config factory but is never read inside any of them. The PR title "remove mock arg" directly contradicts leaving it in, and ruff flags it as ARG001 in all five locations. This will fail linting/CI.

🔧 Proposed fix (shown for one function; apply the same diff to all five)

-def nemotron_3_nano_pretrain_config_gb300( - precision: str = "bf16", mock: bool = True, config_variant: str = "v1" -) -> ConfigContainer: +def nemotron_3_nano_pretrain_config_gb300( + precision: str = "bf16", config_variant: str = "v1" +) -> ConfigContainer:

Also applies to: 58-60, 82-84, 106-108, 130-132

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 35-35: Unused function argument: mock

(ARG001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py` around lines 34 - 36, Remove the unused mock parameter from the five pretrain config factory signatures: e.g., remove "mock: bool = True" from nemotron_3_nano_pretrain_config_gb300 and the four other nemotron_3_nano_*_pretrain_config_* functions in this file; update each function signature to drop the mock parameter and ensure no internal references remain (ruff ARG001 will be resolved). Also update any local references or tests that call these factories with the mock positional/kwarg so callers pass only the remaining parameters (precision and config_variant).

Signed-off-by: Malay Nagda <malayn@nvidia.com>

Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Signed-off-by: Malay Nagda <malayn@nvidia.com>

temp add mock arg

e527e3f

Signed-off-by: Malay Nagda <malayn@nvidia.com>

malay-nagda added 4 commits February 24, 2026 17:55

explicit hybirdep backend

374e85e

Signed-off-by: Malay Nagda <malayn@nvidia.com>

gbs=512

4f05700

Signed-off-by: Malay Nagda <malayn@nvidia.com>

mbs=1 for h100

0011573

Signed-off-by: Malay Nagda <malayn@nvidia.com>

h100 cfg

4bdf493

Signed-off-by: Malay Nagda <malayn@nvidia.com>

malay-nagda requested review from erhoo82, ko3n1g and tomlifu February 25, 2026 07:01

malay-nagda added performance r0.3.0 Cherry-pick label for r0.3.0 release branch labels Feb 25, 2026

malay-nagda self-assigned this Feb 25, 2026

malay-nagda marked this pull request as ready for review February 25, 2026 07:05

copy-pr-bot bot temporarily deployed to test February 25, 2026 07:05 Inactive

malay-nagda changed the title ~~remove mock arg~~ Update Nemotron 3 Nano perf configs Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 07:28 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 07:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 07:46 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 25, 2026 07:46 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 07:46 Inactive

ko3n1g approved these changes Feb 25, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 25, 2026 09:55 Inactive

ko3n1g approved these changes Feb 26, 2026

View reviewed changes

ko3n1g merged commit 47dacd4 into main Feb 26, 2026
90 of 92 checks passed

ko3n1g deleted the malay/mock_arg_remove branch February 26, 2026 09:31

malay-nagda added a commit that referenced this pull request Feb 26, 2026

Update Nemotron 3 Nano perf configs (#2510)

8d907fd

Signed-off-by: Malay Nagda <malayn@nvidia.com>

ko3n1g pushed a commit that referenced this pull request Feb 26, 2026

Update Nemotron 3 Nano perf configs (#2510) (#2560)

6cf5879

Signed-off-by: Malay Nagda <malayn@nvidia.com>

ko3n1g mentioned this pull request Feb 26, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

pengdurice pushed a commit to pengdurice/Megatron-Bridge that referenced this pull request Feb 26, 2026

Update Nemotron 3 Nano perf configs (NVIDIA-NeMo#2510)

5b59544

Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

coderabbitai bot mentioned this pull request Mar 3, 2026

nemotron3_nano_h100_fix_260201 #2617

Merged

5 tasks

copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026

Update Nemotron 3 Nano perf configs (#2510)

4b259a5

Signed-off-by: Malay Nagda <malayn@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Nemotron 3 Nano perf configs#2510

Update Nemotron 3 Nano perf configs#2510
ko3n1g merged 5 commits intomainfrom
malay/mock_arg_remove

malay-nagda commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

malay-nagda commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

malay-nagda commented Feb 24, 2026 •

edited by coderabbitai bot

Loading