Skip to content

Update Nemotron 3 Nano perf configs#2510

Merged
ko3n1g merged 5 commits intomainfrom
malay/mock_arg_remove
Feb 26, 2026
Merged

Update Nemotron 3 Nano perf configs#2510
ko3n1g merged 5 commits intomainfrom
malay/mock_arg_remove

Conversation

@malay-nagda
Copy link
Copy Markdown
Contributor

@malay-nagda malay-nagda commented Feb 24, 2026

What does this PR do ?

Updates Nemotron 3 Nano perf configs.

Changelog

BASE_NEMOTRON_3_NANO_CONFIG = WorkloadBaseConfig(
    num_gpus=8,
    global_batch_size=512,
    micro_batch_size=2,
    tensor_model_parallel_size=1,
    expert_tensor_parallel_size=1,
    expert_model_parallel_size=8,
    moe_flex_dispatcher_backend="hybridep",
)
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 = replace(
    BASE_NEMOTRON_3_NANO_CONFIG,
    num_gpus=16,
    global_batch_size=1024,
    micro_batch_size=1,
    recompute_modules=["moe", "layernorm"],
)

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Updated Nemotron 3 Nano model configurations across GB300, GB200, B300, B200, and H100 variants with optimized batch sizes and parallelization settings.
    • Added support for MOE (Mixture of Experts) flex dispatcher backend with "hybridep" mode.
    • Streamlined configuration inheritance to reduce redundant variant definitions.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
@malay-nagda malay-nagda added performance r0.3.0 Cherry-pick label for r0.3.0 release branch labels Feb 25, 2026
@malay-nagda malay-nagda self-assigned this Feb 25, 2026
@malay-nagda malay-nagda marked this pull request as ready for review February 25, 2026 07:05
@malay-nagda malay-nagda changed the title remove mock arg Update Nemotron 3 Nano perf configs Feb 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

The changes modify Nemotron 3 Nano configuration setup by adding a mock parameter to pretrain config factory functions with conditional propagation of moe_flex_dispatcher_backend from base config, and refactoring workload base configs to consolidate pretrain variants as direct references to the base config while introducing a specialized H100 configuration variant.

Changes

Cohort / File(s) Summary
Pretrain Config Factory Functions
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py
Added mock: bool = True parameter to five factory functions (nemotron_3_nano_pretrain_config_gb300, nemotron_3_nano_pretrain_config_gb200, nemotron_3_nano_pretrain_config_b300, nemotron_3_nano_pretrain_config_b200, nemotron_3_nano_pretrain_config_h100). Each function now conditionally propagates moe_flex_dispatcher_backend from base config to model config when the field is not None.
Workload Base Configuration
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py
Updated global parameters (global_batch_size: 3072→512, tensor_model_parallel_size: 4→1, added moe_flex_dispatcher_backend="hybridep"). Restructured multiple pretrain config variants (GB300, GB200, B300, B200) to reference BASE_NEMOTRON_3_NANO_CONFIG directly instead of using replace(). Introduced internal H100 helper config with specialized parameters (num_gpus=16, global_batch_size=1024, micro_batch_size=1, recompute_modules=["moe","layernorm"]) with two public aliases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • nemotron3 nano recipes #2301: Introduces moe_flex_dispatcher_backend field in nemotron_3_nano workload base configs that is subsequently propagated by this PR's factory function changes.

Suggested reviewers

  • erhoo82
  • ko3n1g
  • tomlifu
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains significant Nemotron 3 Nano configuration changes affecting performance and training convergence without documented test results or validation metrics. Add comprehensive testing documentation including before-and-after performance comparisons, convergence validation results, testing environment details, and benchmark results justifying configuration changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Update Nemotron 3 Nano perf configs' clearly summarizes the main change: updating performance configuration files for Nemotron 3 Nano models across multiple config variants.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch malay/mock_arg_remove

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py (1)

54-62: Use a tuple (or immutable sequence) for recompute_modules to guard against accidental mutation.

The ["moe", "layernorm"] list literal is stored inside the dataclass and shared by both NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1. Even if the dataclass is frozen (preventing field reassignment), the list object itself remains mutable; an in-place .append() / .remove() call elsewhere would corrupt both aliases at once.

♻️ Proposed fix
 _NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 = replace(
     BASE_NEMOTRON_3_NANO_CONFIG,
     num_gpus=16,
     global_batch_size=1024,
     micro_batch_size=1,
-    recompute_modules=["moe", "layernorm"],
+    recompute_modules=("moe", "layernorm"),
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`
around lines 54 - 62, The recompute_modules argument in the
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 replace call is using a mutable list
literal ["moe", "layernorm"] that will be shared across the derived configs;
change it to an immutable tuple ("moe", "layernorm") (or another immutable
sequence) so the value stored in the dataclass cannot be mutated in-place;
update the recompute_modules parameter in the replace call for
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 accordingly so both
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1 inherit the immutable sequence.
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py (1)

52-53: Extract the repeated moe_flex_dispatcher_backend propagation into a shared helper.

The identical two-line conditional is copy-pasted verbatim into all five functions. Folding it into set_nemotron_3_nano_common_configs (by also accepting base_cfg) or a separate helper eliminates the duplication and ensures any future changes are applied consistently.

♻️ Proposed refactor
-def set_nemotron_3_nano_common_configs(cfg: ConfigContainer) -> None:
-    """Set common performance configurations for all Nemotron 3 Nano configs."""
+def set_nemotron_3_nano_common_configs(cfg: ConfigContainer, base_cfg=None) -> None:
+    """Set common performance configurations for all Nemotron 3 Nano configs."""
     cfg.mixed_precision.grad_reduce_in_fp32 = False
     cfg.ddp.grad_reduce_in_fp32 = False
+    if base_cfg is not None and base_cfg.moe_flex_dispatcher_backend is not None:
+        cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend

Then, in each function replace:

     set_nemotron_3_nano_common_configs(cfg)
     set_workload_base_configs(cfg, base_cfg)
-    if base_cfg.moe_flex_dispatcher_backend is not None:
-        cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend
+    set_nemotron_3_nano_common_configs(cfg, base_cfg)

Also applies to: 76-77, 100-101, 124-125, 148-149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py` around
lines 52 - 53, The two-line conditional that propagates
base_cfg.moe_flex_dispatcher_backend into cfg.model.moe_flex_dispatcher_backend
is duplicated across five functions; extract it into a single helper (e.g., add
an argument base_cfg to set_nemotron_3_nano_common_configs or create a small
function like propagate_moe_flex_dispatcher_backend(base_cfg, cfg)) and call
that helper from each of the five places instead of repeating the conditional;
ensure the helper checks base_cfg.moe_flex_dispatcher_backend is not None and
assigns to cfg.model.moe_flex_dispatcher_backend, and update the five callers to
invoke the helper (or to pass base_cfg into set_nemotron_3_nano_common_configs)
so future changes are centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py`:
- Around line 34-36: Remove the unused mock parameter from the five pretrain
config factory signatures: e.g., remove "mock: bool = True" from
nemotron_3_nano_pretrain_config_gb300 and the four other
nemotron_3_nano_*_pretrain_config_* functions in this file; update each function
signature to drop the mock parameter and ensure no internal references remain
(ruff ARG001 will be resolved). Also update any local references or tests that
call these factories with the mock positional/kwarg so callers pass only the
remaining parameters (precision and config_variant).

---

Nitpick comments:
In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py`:
- Around line 52-53: The two-line conditional that propagates
base_cfg.moe_flex_dispatcher_backend into cfg.model.moe_flex_dispatcher_backend
is duplicated across five functions; extract it into a single helper (e.g., add
an argument base_cfg to set_nemotron_3_nano_common_configs or create a small
function like propagate_moe_flex_dispatcher_backend(base_cfg, cfg)) and call
that helper from each of the five places instead of repeating the conditional;
ensure the helper checks base_cfg.moe_flex_dispatcher_backend is not None and
assigns to cfg.model.moe_flex_dispatcher_backend, and update the five callers to
invoke the helper (or to pass base_cfg into set_nemotron_3_nano_common_configs)
so future changes are centralized.

In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Around line 54-62: The recompute_modules argument in the
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 replace call is using a mutable list
literal ["moe", "layernorm"] that will be shared across the derived configs;
change it to an immutable tuple ("moe", "layernorm") (or another immutable
sequence) so the value stored in the dataclass cannot be mutated in-place;
update the recompute_modules parameter in the replace call for
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 accordingly so both
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1 and
NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1 inherit the immutable sequence.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8896cd and 4bdf493.

📒 Files selected for processing (2)
  • scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py
  • scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py

Comment on lines +34 to +36
def nemotron_3_nano_pretrain_config_gb300(
precision: str = "bf16", mock: bool = True, config_variant: str = "v1"
) -> ConfigContainer:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the unused mock parameter from all five function signatures.

The mock: bool = True argument is added to every pretrain config factory but is never read inside any of them. The PR title "remove mock arg" directly contradicts leaving it in, and ruff flags it as ARG001 in all five locations. This will fail linting/CI.

🔧 Proposed fix (shown for one function; apply the same diff to all five)
-def nemotron_3_nano_pretrain_config_gb300(
-    precision: str = "bf16", mock: bool = True, config_variant: str = "v1"
-) -> ConfigContainer:
+def nemotron_3_nano_pretrain_config_gb300(
+    precision: str = "bf16", config_variant: str = "v1"
+) -> ConfigContainer:

Also applies to: 58-60, 82-84, 106-108, 130-132

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 35-35: Unused function argument: mock

(ARG001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.py` around
lines 34 - 36, Remove the unused mock parameter from the five pretrain config
factory signatures: e.g., remove "mock: bool = True" from
nemotron_3_nano_pretrain_config_gb300 and the four other
nemotron_3_nano_*_pretrain_config_* functions in this file; update each function
signature to drop the mock parameter and ensure no internal references remain
(ruff ARG001 will be resolved). Also update any local references or tests that
call these factories with the mock positional/kwarg so callers pass only the
remaining parameters (precision and config_variant).

@ko3n1g ko3n1g merged commit 47dacd4 into main Feb 26, 2026
90 of 92 checks passed
@ko3n1g ko3n1g deleted the malay/mock_arg_remove branch February 26, 2026 09:31
malay-nagda added a commit that referenced this pull request Feb 26, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
ko3n1g pushed a commit that referenced this pull request Feb 26, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
@ko3n1g ko3n1g mentioned this pull request Feb 26, 2026
5 tasks
pengdurice pushed a commit to pengdurice/Megatron-Bridge that referenced this pull request Feb 26, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
@coderabbitai coderabbitai bot mentioned this pull request Mar 3, 2026
5 tasks
copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants