Conversation
Signed-off-by: Malay Nagda <malayn@nvidia.com>
📝 WalkthroughWalkthroughConfiguration update for Nemotron 3 Nano model training that adds MOE router load balancing, creates new pretrain configuration variants for multiple hardware platforms with adjusted CUDA graph settings, and implements model-specific environment variable handling to preserve cuDNN LayerNorm support. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py (1)
89-100:⚠️ Potential issue | 🟡 Minor
__all__is missing the NVFP4 variants.The NVFP4 config variants are defined (lines 47, 54, 61, 68) but not exported in
__all__. This creates an inconsistency with the FP8_MX variants which are exported.Proposed fix
__all__ = [ "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB300_BF16_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB300_FP8_MX_V1", + "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB300_NVFP4_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB200_BF16_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB200_FP8_MX_V1", + "NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB200_NVFP4_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B300_BF16_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B300_FP8_MX_V1", + "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B300_NVFP4_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B200_BF16_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B200_FP8_MX_V1", + "NEMOTRON_3_NANO_PRETRAIN_CONFIG_B200_NVFP4_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_BF16_V1", "NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100_FP8_CS_V1", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py` around lines 89 - 100, The __all__ list omits the NVFP4 config symbols; add the four NVFP4 exports (the constants named with the pattern NEMOTRON_3_NANO_PRETRAIN_CONFIG_*_NVFP4_V1) to the __all__ array so the NVFP4 variants defined earlier (the ones at lines where NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB300_NVFP4_V1, NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB200_NVFP4_V1, NEMOTRON_3_NANO_PRETRAIN_CONFIG_B300_NVFP4_V1, and NEMOTRON_3_NANO_PRETRAIN_CONFIG_B200_NVFP4_V1 are declared) are exported alongside the FP8_MX entries.
🧹 Nitpick comments (1)
scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py (1)
70-76: Redundantcuda_graph_implassignment.Line 75 sets
cuda_graph_impl="transformer_engine", butBASE_NEMOTRON_3_NANO_CONFIG(line 38) already sets this. Thereplace()call inherits the value automatically.Suggested cleanup
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 = replace( BASE_NEMOTRON_3_NANO_CONFIG, num_gpus=16, global_batch_size=1024, micro_batch_size=1, - cuda_graph_impl="transformer_engine", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py` around lines 70 - 76, The replace call creating _NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 redundantly reassigns cuda_graph_impl even though BASE_NEMOTRON_3_NANO_CONFIG already defines it; remove the cuda_graph_impl="transformer_engine" argument from the replace(...) call so the new config inherits the value from BASE_NEMOTRON_3_NANO_CONFIG, keeping only the overridden fields (num_gpus, global_batch_size, micro_batch_size) in the _NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Around line 89-100: The __all__ list omits the NVFP4 config symbols; add the
four NVFP4 exports (the constants named with the pattern
NEMOTRON_3_NANO_PRETRAIN_CONFIG_*_NVFP4_V1) to the __all__ array so the NVFP4
variants defined earlier (the ones at lines where
NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB300_NVFP4_V1,
NEMOTRON_3_NANO_PRETRAIN_CONFIG_GB200_NVFP4_V1,
NEMOTRON_3_NANO_PRETRAIN_CONFIG_B300_NVFP4_V1, and
NEMOTRON_3_NANO_PRETRAIN_CONFIG_B200_NVFP4_V1 are declared) are exported
alongside the FP8_MX entries.
---
Nitpick comments:
In
`@scripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.py`:
- Around line 70-76: The replace call creating
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 redundantly reassigns cuda_graph_impl even
though BASE_NEMOTRON_3_NANO_CONFIG already defines it; remove the
cuda_graph_impl="transformer_engine" argument from the replace(...) call so the
new config inherits the value from BASE_NEMOTRON_3_NANO_CONFIG, keeping only the
overridden fields (num_gpus, global_batch_size, micro_batch_size) in the
_NEMOTRON_3_NANO_PRETRAIN_CONFIG_H100 definition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e8053d25-c528-449b-95f2-2e8e01439ee8
📒 Files selected for processing (3)
scripts/performance/configs/nemotronh/nemotron_3_nano_llm_pretrain.pyscripts/performance/configs/nemotronh/nemotron_3_nano_workload_base_configs.pyscripts/performance/perf_plugins.py
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit