Skip to content

Onboard NVFP4 and MXFP8 recipes#2600

Merged
ko3n1g merged 3 commits intoNVIDIA-NeMo:mainfrom
dingqingy-nv:onboard_recipes_patch
Mar 14, 2026
Merged

Onboard NVFP4 and MXFP8 recipes#2600
ko3n1g merged 3 commits intoNVIDIA-NeMo:mainfrom
dingqingy-nv:onboard_recipes_patch

Conversation

@dingqingy-nv
Copy link
Copy Markdown
Contributor

@dingqingy-nv dingqingy-nv commented Feb 28, 2026

What does this PR do ?

  • Onboard NVFP4 recipes for DeepSeek-V3 (GB200, B300, B200) and Qwen3 235B A22B (GB300, GB200, B300, B200)
  • Onboard MXFP8 recipes for GPT-OSS 120B across all GPUs (GB300, GB200, B300, B200, H100)
  • Add flex dispatcher interface to GPT-OSS pretrain configs via apply_flex_dispatcher_backend, enabling future switch from alltoall to flex dispatcher.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added NVFP4 pretraining configuration variants (V1 and V2) for DeepSeek and Qwen models across B200, B300, GB200, and GB300 GPU types.
    • Added FP8_MX pretraining configuration variants (V1 and V2) for GPT-OSS models.
    • Enabled dynamic flex dispatcher backend support for GPT-OSS training pipelines.

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
…p is still not performant on gpt-oss

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
@dingqingy-nv dingqingy-nv added performance performance/release Performance items related with NeMo release r0.3.0 Cherry-pick label for r0.3.0 release branch labels Feb 28, 2026
@dingqingy-nv dingqingy-nv enabled auto-merge (squash) February 28, 2026 20:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

This PR adds new configuration aliases for quantized and optimized variants across DeepSeek (NVFP4), GPT-OSS (FP8_MX), and Qwen (NVFP4) models, extending support across multiple GPU types (B200, B300, GB200, GB300, and H100 for GPT-OSS). Additionally, GPT-OSS training incorporates flex dispatcher backend application when configured.

Changes

Cohort / File(s) Summary
DeepSeek NVFP4 Configuration Aliases
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Added 8 new NVFP4_V1 and NVFP4_V2 alias constants mapping to existing BF16 variants (B200, B300, GB200, GB300) and updated public exports in all.
DeepSeek Configuration Exports
scripts/performance/configs/deepseek/__init__.py
Imported and exposed 8 NVFP4 variant constants in public API for V1 and V2 configurations.
GPT-OSS FP8_MX Configuration Aliases
scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py
Added 10 new FP8_MX_V1 and FP8_MX_V2 alias constants mapping to existing BF16 variants (B200, B300, GB200, GB300, H100) and updated all exports.
GPT-OSS Configuration Exports
scripts/performance/configs/gpt_oss/__init__.py
Imported and exposed 10 FP8_MX variant constants in public API for V1 and V2 configurations.
GPT-OSS Training Flex Dispatcher Support
scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py
Added conditional flex dispatcher backend application for each GPU configuration variant when moe_flex_dispatcher_backend is specified in base config.
Qwen NVFP4 Configuration Aliases
scripts/performance/configs/qwen/qwen3_workload_base_configs.py
Added 8 new NVFP4_V1 and NVFP4_V2 alias constants mapping to existing FP8_CS variants (B200, B300, GB200, GB300) and updated all exports.
Qwen Configuration Exports
scripts/performance/configs/qwen/__init__.py
Imported and exposed 8 NVFP4 variant constants in public API for V1 and V2 configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • erhoo82
  • ko3n1g
  • malay-nagda
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major quantization scheme changes (NVFP4, MXFP8) and flex dispatcher interface affecting numerics, convergence, and performance, but PR description lacks test results, performance benchmarks, and convergence validation. Update PR description with performance benchmarks for new recipes, convergence validation evidence, and testing methodology to demonstrate no regression.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Onboard NVFP4 and MXFP8 recipes' accurately summarizes the main changes: adding NVFP4 variants for DeepSeek and Qwen models, and MXFP8 variants for GPT-OSS configurations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/performance/configs/gpt_oss/__init__.py (1)

1-6: 🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header.

This file is missing the standard NVIDIA copyright header that is present in other files in this PR. As per coding guidelines, add the NVIDIA copyright header to all Python files.

📝 Add copyright header at the top of the file
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 try:
     import megatron.bridge  # noqa: F401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/__init__.py` around lines 1 - 6, This
file is missing the standard NVIDIA copyright header; add the same NVIDIA
copyright/header block used across the PR at the very top of
scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave
the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE
try/except and variable) unchanged; ensure the header text exactly matches the
project's standard header used in other Python files.
🧹 Nitpick comments (4)
scripts/performance/configs/deepseek/__init__.py (1)

1-6: Consider adding NVIDIA copyright header.

This file lacks the required copyright header. While pre-existing, this is a good opportunity to add it since the file is being modified. As per coding guidelines: "Add NVIDIA copyright header to all Python files and shell scripts at the top of the file".

📄 Proposed fix to add copyright header
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 try:
     import megatron.bridge  # noqa: F401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/deepseek/__init__.py` around lines 1 - 6, Add the
standard NVIDIA copyright header at the top of this Python module before any
imports; modify the module containing the import of megatron.bridge and the
HAVE_MEGATRON_BRIDGE assignment (the try/except block and constants) by
inserting the required header comment block as the very first lines of the file
so the header precedes the existing try: import megatron.bridge  # noqa: F401
and the HAVE_MEGATRON_BRIDGE variable definitions.
scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py (2)

85-90: Consider adding a comment explaining the alias.

The FP8_MX configs are direct aliases to BF16 configs. Based on the PR description, this is intentional because "hybridep is still not performant on gpt-oss." Adding a brief comment would help future maintainers understand this is a temporary aliasing and not a mistake.

📝 Suggested comment
+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS)
 GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V1
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around
lines 85 - 90, Add a brief explanatory comment above the FP8 alias lines
(GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are
intentionally aliased to the BF16 variants (e.g.,
GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep
is not yet performant on gpt-oss, and note that this should be revisited when
hybridep performance improves.

125-130: Same comment suggestion applies to V2 aliases.

For consistency, consider adding a similar comment above the V2 FP8_MX aliases.

📝 Suggested comment
+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS)
 GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V2
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around
lines 125 - 130, Add the same explanatory comment above the V2 FP8_MX alias
block to match the earlier note: place a brief comment before the lines defining
GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment
used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and
why), so readers understand these are intentional aliases rather than distinct
configs.
scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py (1)

38-39: The mock parameter is declared but unused in all 5 config functions and should be addressed.

The parameter is part of the public interface—callers explicitly pass it (as shown in tests)—so it cannot be simply removed without breaking compatibility. Either implement its intended use in the function bodies or remove it consistently across all callers and function signatures. If the parameter is intentional for future use or API compatibility, add a docstring documenting its purpose.

This pattern is consistent across all pretrain config functions in the codebase (gpt_oss, qwen_vl, nemotronh, kimi), suggesting a systemic design decision that should be addressed consistently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py` around lines 38
- 39, The functions like gpt_oss_120b_pretrain_config_gb300 declare a `mock`
parameter but never use it; preserve the parameter for API compatibility and
document its intent: add a short docstring to each pretrain config function (all
five in this module) that explains `mock: bool` is a reserved, no-op flag used
by tests to select a lightweight/mock configuration and should not affect
runtime behaviour, or alternatively implement the intended mock behavior if
there is a clear lightweight path; update the docstring of
gpt_oss_120b_pretrain_config_gb300 and the other four config functions to
mention `mock` is intentionally kept for backward compatibility/testing and is
currently a no-op (or describe the implemented mock behavior) so callers
understand its purpose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@scripts/performance/configs/gpt_oss/__init__.py`:
- Around line 1-6: This file is missing the standard NVIDIA copyright header;
add the same NVIDIA copyright/header block used across the PR at the very top of
scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave
the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE
try/except and variable) unchanged; ensure the header text exactly matches the
project's standard header used in other Python files.

---

Nitpick comments:
In `@scripts/performance/configs/deepseek/__init__.py`:
- Around line 1-6: Add the standard NVIDIA copyright header at the top of this
Python module before any imports; modify the module containing the import of
megatron.bridge and the HAVE_MEGATRON_BRIDGE assignment (the try/except block
and constants) by inserting the required header comment block as the very first
lines of the file so the header precedes the existing try: import
megatron.bridge  # noqa: F401 and the HAVE_MEGATRON_BRIDGE variable definitions.

In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py`:
- Around line 38-39: The functions like gpt_oss_120b_pretrain_config_gb300
declare a `mock` parameter but never use it; preserve the parameter for API
compatibility and document its intent: add a short docstring to each pretrain
config function (all five in this module) that explains `mock: bool` is a
reserved, no-op flag used by tests to select a lightweight/mock configuration
and should not affect runtime behaviour, or alternatively implement the intended
mock behavior if there is a clear lightweight path; update the docstring of
gpt_oss_120b_pretrain_config_gb300 and the other four config functions to
mention `mock` is intentionally kept for backward compatibility/testing and is
currently a no-op (or describe the implemented mock behavior) so callers
understand its purpose.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py`:
- Around line 85-90: Add a brief explanatory comment above the FP8 alias lines
(GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are
intentionally aliased to the BF16 variants (e.g.,
GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep
is not yet performant on gpt-oss, and note that this should be revisited when
hybridep performance improves.
- Around line 125-130: Add the same explanatory comment above the V2 FP8_MX
alias block to match the earlier note: place a brief comment before the lines
defining GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment
used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and
why), so readers understand these are intentional aliases rather than distinct
configs.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 383b610 and a13dad2.

📒 Files selected for processing (7)
  • scripts/performance/configs/deepseek/__init__.py
  • scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
  • scripts/performance/configs/gpt_oss/__init__.py
  • scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py
  • scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py
  • scripts/performance/configs/qwen/__init__.py
  • scripts/performance/configs/qwen/qwen3_workload_base_configs.py

@ko3n1g ko3n1g disabled auto-merge March 14, 2026 18:51
@ko3n1g ko3n1g merged commit acc52e2 into NVIDIA-NeMo:main Mar 14, 2026
65 of 66 checks passed
svcnvidia-nemo-ci pushed a commit that referenced this pull request Mar 14, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance/release Performance items related with NeMo release performance r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants