Onboard NVFP4 and MXFP8 recipes#2600
Conversation
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
…p is still not performant on gpt-oss Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds new configuration aliases for quantized and optimized variants across DeepSeek (NVFP4), GPT-OSS (FP8_MX), and Qwen (NVFP4) models, extending support across multiple GPU types (B200, B300, GB200, GB300, and H100 for GPT-OSS). Additionally, GPT-OSS training incorporates flex dispatcher backend application when configured. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/performance/configs/gpt_oss/__init__.py (1)
1-6: 🛠️ Refactor suggestion | 🟠 MajorMissing NVIDIA copyright header.
This file is missing the standard NVIDIA copyright header that is present in other files in this PR. As per coding guidelines, add the NVIDIA copyright header to all Python files.
📝 Add copyright header at the top of the file
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + try: import megatron.bridge # noqa: F401🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/gpt_oss/__init__.py` around lines 1 - 6, This file is missing the standard NVIDIA copyright header; add the same NVIDIA copyright/header block used across the PR at the very top of scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE try/except and variable) unchanged; ensure the header text exactly matches the project's standard header used in other Python files.
🧹 Nitpick comments (4)
scripts/performance/configs/deepseek/__init__.py (1)
1-6: Consider adding NVIDIA copyright header.This file lacks the required copyright header. While pre-existing, this is a good opportunity to add it since the file is being modified. As per coding guidelines: "Add NVIDIA copyright header to all Python files and shell scripts at the top of the file".
📄 Proposed fix to add copyright header
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + try: import megatron.bridge # noqa: F401🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/deepseek/__init__.py` around lines 1 - 6, Add the standard NVIDIA copyright header at the top of this Python module before any imports; modify the module containing the import of megatron.bridge and the HAVE_MEGATRON_BRIDGE assignment (the try/except block and constants) by inserting the required header comment block as the very first lines of the file so the header precedes the existing try: import megatron.bridge # noqa: F401 and the HAVE_MEGATRON_BRIDGE variable definitions.scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py (2)
85-90: Consider adding a comment explaining the alias.The FP8_MX configs are direct aliases to BF16 configs. Based on the PR description, this is intentional because "hybridep is still not performant on gpt-oss." Adding a brief comment would help future maintainers understand this is a temporary aliasing and not a mistake.
📝 Suggested comment
+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS) GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V1 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V1 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V1 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V1 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V1 -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around lines 85 - 90, Add a brief explanatory comment above the FP8 alias lines (GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1, GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1, GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1, GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1, GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are intentionally aliased to the BF16 variants (e.g., GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep is not yet performant on gpt-oss, and note that this should be revisited when hybridep performance improves.
125-130: Same comment suggestion applies to V2 aliases.For consistency, consider adding a similar comment above the V2 FP8_MX aliases.
📝 Suggested comment
+# FP8_MX aliases to BF16 configs (temporary: hybridep not yet performant on GPT-OSS) GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB300_BF16_V2 GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_GB200_BF16_V2 GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B300_BF16_V2 GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_B200_BF16_V2 GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 = GPT_OSS_120B_PRETRAIN_CONFIG_H100_BF16_V2 -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py` around lines 125 - 130, Add the same explanatory comment above the V2 FP8_MX alias block to match the earlier note: place a brief comment before the lines defining GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2, GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2, GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2, GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and why), so readers understand these are intentional aliases rather than distinct configs.scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py (1)
38-39: Themockparameter is declared but unused in all 5 config functions and should be addressed.The parameter is part of the public interface—callers explicitly pass it (as shown in tests)—so it cannot be simply removed without breaking compatibility. Either implement its intended use in the function bodies or remove it consistently across all callers and function signatures. If the parameter is intentional for future use or API compatibility, add a docstring documenting its purpose.
This pattern is consistent across all pretrain config functions in the codebase (gpt_oss, qwen_vl, nemotronh, kimi), suggesting a systemic design decision that should be addressed consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py` around lines 38 - 39, The functions like gpt_oss_120b_pretrain_config_gb300 declare a `mock` parameter but never use it; preserve the parameter for API compatibility and document its intent: add a short docstring to each pretrain config function (all five in this module) that explains `mock: bool` is a reserved, no-op flag used by tests to select a lightweight/mock configuration and should not affect runtime behaviour, or alternatively implement the intended mock behavior if there is a clear lightweight path; update the docstring of gpt_oss_120b_pretrain_config_gb300 and the other four config functions to mention `mock` is intentionally kept for backward compatibility/testing and is currently a no-op (or describe the implemented mock behavior) so callers understand its purpose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@scripts/performance/configs/gpt_oss/__init__.py`:
- Around line 1-6: This file is missing the standard NVIDIA copyright header;
add the same NVIDIA copyright/header block used across the PR at the very top of
scripts/performance/configs/gpt_oss/__init__.py before any imports, then leave
the existing logic (the import megatron.bridge and the HAVE_MEGATRON_BRIDGE
try/except and variable) unchanged; ensure the header text exactly matches the
project's standard header used in other Python files.
---
Nitpick comments:
In `@scripts/performance/configs/deepseek/__init__.py`:
- Around line 1-6: Add the standard NVIDIA copyright header at the top of this
Python module before any imports; modify the module containing the import of
megatron.bridge and the HAVE_MEGATRON_BRIDGE assignment (the try/except block
and constants) by inserting the required header comment block as the very first
lines of the file so the header precedes the existing try: import
megatron.bridge # noqa: F401 and the HAVE_MEGATRON_BRIDGE variable definitions.
In `@scripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.py`:
- Around line 38-39: The functions like gpt_oss_120b_pretrain_config_gb300
declare a `mock` parameter but never use it; preserve the parameter for API
compatibility and document its intent: add a short docstring to each pretrain
config function (all five in this module) that explains `mock: bool` is a
reserved, no-op flag used by tests to select a lightweight/mock configuration
and should not affect runtime behaviour, or alternatively implement the intended
mock behavior if there is a clear lightweight path; update the docstring of
gpt_oss_120b_pretrain_config_gb300 and the other four config functions to
mention `mock` is intentionally kept for backward compatibility/testing and is
currently a no-op (or describe the implemented mock behavior) so callers
understand its purpose.
In `@scripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.py`:
- Around line 85-90: Add a brief explanatory comment above the FP8 alias lines
(GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V1,
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V1) stating these FP8_MX constants are
intentionally aliased to the BF16 variants (e.g.,
GPT_OSS_120B_PRETRAIN_CONFIG_*_BF16_V1) as a temporary measure because hybridep
is not yet performant on gpt-oss, and note that this should be revisited when
hybridep performance improves.
- Around line 125-130: Add the same explanatory comment above the V2 FP8_MX
alias block to match the earlier note: place a brief comment before the lines
defining GPT_OSS_120B_PRETRAIN_CONFIG_GB300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_GB200_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B300_FP8_MX_V2,
GPT_OSS_120B_PRETRAIN_CONFIG_B200_FP8_MX_V2, and
GPT_OSS_120B_PRETRAIN_CONFIG_H100_FP8_MX_V2 that mirrors the existing comment
used for the V1 aliases (explaining these are aliases to the BF16 V2 configs and
why), so readers understand these are intentional aliases rather than distinct
configs.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
scripts/performance/configs/deepseek/__init__.pyscripts/performance/configs/deepseek/deepseek_workload_base_configs.pyscripts/performance/configs/gpt_oss/__init__.pyscripts/performance/configs/gpt_oss/gpt_oss_llm_pretrain.pyscripts/performance/configs/gpt_oss/gpt_oss_workload_base_configs.pyscripts/performance/configs/qwen/__init__.pyscripts/performance/configs/qwen/qwen3_workload_base_configs.py
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
What does this PR do ?
apply_flex_dispatcher_backend, enabling future switch from alltoall to flex dispatcher.Summary by CodeRabbit
Release Notes