Onboard NVFP4 and MXFP8 recipes by dingqingy-nv · Pull Request #2600 · NVIDIA-NeMo/Megatron-Bridge

dingqingy-nv · 2026-02-28T19:59:27Z

What does this PR do ?

Onboard NVFP4 recipes for DeepSeek-V3 (GB200, B300, B200) and Qwen3 235B A22B (GB300, GB200, B300, B200)
Onboard MXFP8 recipes for GPT-OSS 120B across all GPUs (GB300, GB200, B300, B200, H100)
Add flex dispatcher interface to GPT-OSS pretrain configs via apply_flex_dispatcher_backend, enabling future switch from alltoall to flex dispatcher.

Summary by CodeRabbit

Release Notes

New Features
- Added NVFP4 pretraining configuration variants (V1 and V2) for DeepSeek and Qwen models across B200, B300, GB200, and GB300 GPU types.
- Added FP8_MX pretraining configuration variants (V1 and V2) for GPT-OSS models.
- Enabled dynamic flex dispatcher backend support for GPT-OSS training pipelines.

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

…p is still not performant on gpt-oss Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

coderabbitai · 2026-02-28T20:03:53Z

📝 Walkthrough

Walkthrough

This PR adds new configuration aliases for quantized and optimized variants across DeepSeek (NVFP4), GPT-OSS (FP8_MX), and Qwen (NVFP4) models, extending support across multiple GPU types (B200, B300, GB200, GB300, and H100 for GPT-OSS). Additionally, GPT-OSS training incorporates flex dispatcher backend application when configured.

Changes

Cohort / File(s)	Summary
DeepSeek NVFP4 Configuration Aliases `scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`	Added 8 new NVFP4_V1 and NVFP4_V2 alias constants mapping to existing BF16 variants (B200, B300, GB200, GB300) and updated public exports in all.
DeepSeek Configuration Exports `scripts/performance/configs/deepseek/__init__.py`	Imported and exposed 8 NVFP4 variant constants in public API for V1 and V2 configurations.
GPT-OSS FP8_MX Configuration Aliases `scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py`	Added 10 new FP8_MX_V1 and FP8_MX_V2 alias constants mapping to existing BF16 variants (B200, B300, GB200, GB300, H100) and updated all exports.
GPT-OSS Configuration Exports `scripts/performance/configs/gpt_oss/__init__.py`	Imported and exposed 10 FP8_MX variant constants in public API for V1 and V2 configurations.
GPT-OSS Training Flex Dispatcher Support `scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py`	Added conditional flex dispatcher backend application for each GPU configuration variant when moe_flex_dispatcher_backend is specified in base config.
Qwen NVFP4 Configuration Aliases `scripts/performance/configs/qwen/qwen3_workload_base_configs.py`	Added 8 new NVFP4_V1 and NVFP4_V2 alias constants mapping to existing FP8_CS variants (B200, B300, GB200, GB300) and updated all exports.
Qwen Configuration Exports `scripts/performance/configs/qwen/__init__.py`	Imported and exposed 8 NVFP4 variant constants in public API for V1 and V2 configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

cp: Dsv3 Recipe Update (2152) into r0.3.0 #2186 — Modifies deepseek workload base configs with NVFP4 variant alias definitions, overlapping with this PR's deepseek additions.
DSV3 NVFP4 recipe on GB300 #2076 — Introduces deepseek NVFP4 config variants and related workload configuration changes that directly overlap with this PR's deepseek NVFP4 additions.
Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN grad norm #2209 — Modifies the underlying FP8_CS base configs that qwen3 NVFP4 aliases reference in this PR.

Suggested reviewers

erhoo82
ko3n1g
malay-nagda

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major quantization scheme changes (NVFP4, MXFP8) and flex dispatcher interface affecting numerics, convergence, and performance, but PR description lacks test results, performance benchmarks, and convergence validation.	Update PR description with performance benchmarks for new recipes, convergence validation evidence, and testing methodology to demonstrate no regression.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Onboard NVFP4 and MXFP8 recipes' accurately summarizes the main changes: adding NVFP4 variants for DeepSeek and Qwen models, and MXFP8 variants for GPT-OSS configurations.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/performance/configs/gpt_oss/__init__.py (1)

1-6: 🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header.

This file is missing the standard NVIDIA copyright header that is present in other files in this PR. As per coding guidelines, add the NVIDIA copyright header to all Python files.

📝 Add copyright header at the top of the file

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 try:
     import megatron.bridge  # noqa: F401

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/__init__.py` around lines 1 - 6, This
file is missing the standard NVIDIA copyright header; add the same NVIDIA
copyright/header block used across the PR at the very top of
scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave
the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE
try/except and variable) unchanged; ensure the header text exactly matches the
project's standard header used in other Python files.

🧹 Nitpick comments (4)

scripts/performance/configs/deepseek/__init__.py (1)

1-6: Consider adding NVIDIA copyright header.

This file lacks the required copyright header. While pre-existing, this is a good opportunity to add it since the file is being modified. As per coding guidelines: "Add NVIDIA copyright header to all Python files and shell scripts at the top of the file".

📄 Proposed fix to add copyright header

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 try:
     import megatron.bridge  # noqa: F401

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/deepseek/__init__.py` around lines 1 - 6, Add the
standard NVIDIA copyright header at the top of this Python module before any
imports; modify the module containing the import of megatron.bridge and the
HAVE_MEGATRON_BRIDGE assignment (the try/except block and constants) by
inserting the required header comment block as the very first lines of the file
so the header precedes the existing try: import megatron.bridge  # noqa: F401
and the HAVE_MEGATRON_BRIDGE variable definitions.

scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py (2)

85-90: Consider adding a comment explaining the alias.

The FP8_MX configs are direct aliases to BF16 configs. Based on the PR description, this is intentional because "hybridep is still not performant on gpt-oss." Adding a brief comment would help future maintainers understand this is a temporary aliasing and not a mistake.

📝 Suggested comment

+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS)
 GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V1
 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V1
-

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around
lines 85 - 90, Add a brief explanatory comment above the FP8 alias lines
(GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are
intentionally aliased to the BF16 variants (e.g.,
GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep
is not yet performant on gpt-oss, and note that this should be revisited when
hybridep performance improves.

125-130: Same comment suggestion applies to V2 aliases.

For consistency, consider adding a similar comment above the V2 FP8_MX aliases.

📝 Suggested comment

+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS)
 GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V2
 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V2
-

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around
lines 125 - 130, Add the same explanatory comment above the V2 FP8_MX alias
block to match the earlier note: place a brief comment before the lines defining
GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment
used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and
why), so readers understand these are intentional aliases rather than distinct
configs.

scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py (1)

38-39: The mock parameter is declared but unused in all 5 config functions and should be addressed.

The parameter is part of the public interface—callers explicitly pass it (as shown in tests)—so it cannot be simply removed without breaking compatibility. Either implement its intended use in the function bodies or remove it consistently across all callers and function signatures. If the parameter is intentional for future use or API compatibility, add a docstring documenting its purpose.

This pattern is consistent across all pretrain config functions in the codebase (gpt_oss, qwen_vl, nemotronh, kimi), suggesting a systemic design decision that should be addressed consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py` around lines 38
- 39, The functions like gpt_oss_120b_pretrain_config_gb300 declare a `mock`
parameter but never use it; preserve the parameter for API compatibility and
document its intent: add a short docstring to each pretrain config function (all
five in this module) that explains `mock: bool` is a reserved, no-op flag used
by tests to select a lightweight/mock configuration and should not affect
runtime behaviour, or alternatively implement the intended mock behavior if
there is a clear lightweight path; update the docstring of
gpt_oss_120b_pretrain_config_gb300 and the other four config functions to
mention `mock` is intentionally kept for backward compatibility/testing and is
currently a no-op (or describe the implemented mock behavior) so callers
understand its purpose.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@scripts/performance/configs/gpt_oss/__init__.py`:
- Around line 1-6: This file is missing the standard NVIDIA copyright header;
add the same NVIDIA copyright/header block used across the PR at the very top of
scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave
the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE
try/except and variable) unchanged; ensure the header text exactly matches the
project's standard header used in other Python files.

---

Nitpick comments:
In `@scripts/performance/configs/deepseek/__init__.py`:
- Around line 1-6: Add the standard NVIDIA copyright header at the top of this
Python module before any imports; modify the module containing the import of
megatron.bridge and the HAVE_MEGATRON_BRIDGE assignment (the try/except block
and constants) by inserting the required header comment block as the very first
lines of the file so the header precedes the existing try: import
megatron.bridge  # noqa: F401 and the HAVE_MEGATRON_BRIDGE variable definitions.

In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py`:
- Around line 38-39: The functions like gpt_oss_120b_pretrain_config_gb300
declare a `mock` parameter but never use it; preserve the parameter for API
compatibility and document its intent: add a short docstring to each pretrain
config function (all five in this module) that explains `mock: bool` is a
reserved, no-op flag used by tests to select a lightweight/mock configuration
and should not affect runtime behaviour, or alternatively implement the intended
mock behavior if there is a clear lightweight path; update the docstring of
gpt_oss_120b_pretrain_config_gb300 and the other four config functions to
mention `mock` is intentionally kept for backward compatibility/testing and is
currently a no-op (or describe the implemented mock behavior) so callers
understand its purpose.

In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py`:
- Around line 85-90: Add a brief explanatory comment above the FP8 alias lines
(GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are
intentionally aliased to the BF16 variants (e.g.,
GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep
is not yet performant on gpt-oss, and note that this should be revisited when
hybridep performance improves.
- Around line 125-130: Add the same explanatory comment above the V2 FP8_MX
alias block to match the earlier note: place a brief comment before the lines
defining GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment
used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and
why), so readers understand these are intentional aliases rather than distinct
configs.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 383b610 and a13dad2.

📒 Files selected for processing (7)

scripts/performance/configs/deepseek/__init__.py
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
scripts/performance/configs/gpt_oss/__init__.py
scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py
scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py
scripts/performance/configs/qwen/__init__.py
scripts/performance/configs/qwen/qwen3_workload_base_configs.py

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

dingqingy-nv added 2 commits February 28, 2026 11:30

onboarding recipes

f0e2c37

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

add flex dispatcher inferface, still do a2a because currently hybride…

a13dad2

…p is still not performant on gpt-oss Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

dingqingy-nv requested review from erhoo82, ko3n1g and malay-nagda February 28, 2026 19:59

dingqingy-nv added performance performance/release Performance items related with NeMo release r0.3.0 Cherry-pick label for r0.3.0 release branch labels Feb 28, 2026

dingqingy-nv enabled auto-merge (squash) February 28, 2026 20:00

copy-pr-bot bot temporarily deployed to test February 28, 2026 20:00 Inactive

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 28, 2026 20:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 28, 2026 20:17 Failure

Merge branch 'main' into onboard_recipes_patch

1ecf27f

copy-pr-bot bot temporarily deployed to test March 3, 2026 00:54 Inactive

copy-pr-bot bot temporarily deployed to public March 3, 2026 01:14 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 01:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 01:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 3, 2026 01:53 Inactive

dingqingy-nv mentioned this pull request Mar 3, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

ko3n1g disabled auto-merge March 14, 2026 18:51

ko3n1g merged commit acc52e2 into NVIDIA-NeMo:main Mar 14, 2026
65 of 66 checks passed

svcnvidia-nemo-ci pushed a commit that referenced this pull request Mar 14, 2026

Onboard NVFP4 and MXFP8 recipes (#2600)

3b85549

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

dingqingy-nv mentioned this pull request Mar 18, 2026

[recipe] GPT-OSS with MXFP8: recipe and test #2884

Closed

copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026

Onboard NVFP4 and MXFP8 recipes (#2600)

0dc3160

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onboard NVFP4 and MXFP8 recipes#2600

Onboard NVFP4 and MXFP8 recipes#2600
ko3n1g merged 3 commits intoNVIDIA-NeMo:mainfrom
dingqingy-nv:onboard_recipes_patch

dingqingy-nv commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dingqingy-nv commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 28, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dingqingy-nv commented Feb 28, 2026 •

edited by coderabbitai bot

Loading