Dsv3 Recipe Update by dingqingy-nv · Pull Request #2152 · NVIDIA-NeMo/Megatron-Bridge

dingqingy-nv · 2026-01-30T19:53:50Z

What does this PR do ?

Use cudnn LN for better perf.
Update dsv3 gb300 mxfp8 recipe to use MBS2 no CG config as Hybrid EP WAR is added in Nemo container.

Summary by CodeRabbit

Chores
- Updated DeepSeek v3 pretraining configurations for GB200 and GB300 hardware variants.
- Streamlined configuration definitions and introduced new predefined aliases for GB300 variants.
- Optimized cuDNN LayerNorm handling for Mixture-of-Experts models with specific compute precision types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-30T20:00:22Z

📝 Walkthrough

Walkthrough

These changes simplify and refactor DeepSeek V3 pretraining configurations for GB300 hardware profiles, consolidating redundant configurations into aliases and inheriting from base settings. Additionally, MoE-specific logic is introduced to preserve cuDNN LayerNorm for MXFP8 and BF16 compute types.

Changes

Cohort / File(s)	Summary
Configuration Refactoring `scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`	Simplified GB300_V1 configuration by removing hardware-specific fields (pipeline parallelism, expert parallelism, CUDA graph options) and introducing four new public aliases (BF16, FP8_CS, FP8_MX, NVFP4_V1) all pointing to the streamlined GB300_V1. Replaced GB300_NVFP4_V2 derived config with a direct alias to GB300_V2.
Layout Parameter Update `scripts/performance/configs/deepseek/deepseek_llm_pretrain.py`	Updated layout parameter from explicit `None` to `base_cfg.pp_layout` in pretrain config assembly for GB200 DeepSeek v3.
MoE Compute Dtype Logic `scripts/performance/perf_plugins.py`	Added conditional preservation of cuDNN LayerNorm for MoE models when using MXFP8 or BF16 compute types by checking model family and recipe names, preventing removal of cuDNN LayerNorm under these specific conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

DSV3 NVFP4 recipe on GB300 #2076 — Updates the same layout parameter in deepseek_llm_pretrain.py from None to base_cfg.pp_layout, indicating coordinated configuration changes.

Suggested reviewers

erhoo82
sanandaraj5597

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR claims performance improvements but lacks before-and-after benchmarks or test results documentation.	Add performance benchmark data including before-and-after metrics, hardware configuration, batch size, and convergence test results.
Title check	❓ Inconclusive	The title 'Dsv3 Recipe Update' is vague and generic, using non-descriptive terminology that does not convey the specific changes being made.	Consider a more specific title such as 'Enable cuDNN LayerNorm for MoE models and update DSV3 mxfp8 recipe' to better communicate the main objectives.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 55-58: The four variant constants
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 and
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 should be created as copies of
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 using dataclasses.replace() rather than
direct assignment to avoid aliasing the same WorkloadBaseConfig instance; update
these assignments (and any similar variant assignments elsewhere, e.g., the ones
you noted around line 137) to call replace(DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
...) so each variant is its own dataclass instance (or alternatively make
WorkloadBaseConfig frozen=True if you prefer enforced immutability).

coderabbitai · 2026-01-30T20:00:25Z

scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

+DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
+DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
+DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
+DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect WorkloadBaseConfig definition and cloning behavior rg -n "class WorkloadBaseConfig" -C 3 rg -n "def get_workload_base_config" -C 3 rg -n "pp_layout|WorkloadBaseConfig\(" -C 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 9797

🏁 Script executed:

#!/bin/bash # Get full WorkloadBaseConfig class definition sed -n '29,78p' scripts/performance/utils/utils.py # Check line 137 in deepseek file sed -n '130,145p' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2661

🏁 Script executed:

#!/bin/bash # Check imports in deepseek_workload_base_configs.py head -20 scripts/performance/configs/deepseek/deepseek_workload_base_configs.py # Search for any mutations/assignments to these config variables rg "DEEPSEEK_V3_PRETRAIN_CONFIG.*=" scripts/performance/configs/deepseek/deepseek_workload_base_configs.py -n # Check if any code mutates config objects after assignment rg "base_cfg\." scripts/performance/configs/deepseek/ -n | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 6886

🏁 Script executed:

#!/bin/bash # Check imports for 'replace' head -30 scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | grep -i "import\|from"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 145

Use replace() for all variant configs to avoid aliasing mutable dataclass instances.

These aliases (lines 55–58, and throughout the file including line 137) reference the same WorkloadBaseConfig instance. While the current usage is read-only, this pattern creates fragile coupling. The inconsistency—where V2 base configs use replace() but variants use direct assignment—should be unified. Consider applying replace() consistently for all variants, or mark WorkloadBaseConfig with frozen=True to enforce immutability.

🤖 Prompt for AI Agents

In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py` around lines 55 - 58, The four variant constants DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1, DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1, DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 and DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 should be created as copies of DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 using dataclasses.replace() rather than direct assignment to avoid aliasing the same WorkloadBaseConfig instance; update these assignments (and any similar variant assignments elsewhere, e.g., the ones you noted around line 137) to call replace(DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1, ...) so each variant is its own dataclass instance (or alternatively make WorkloadBaseConfig frozen=True if you prefer enforced immutability).

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

Revert #2152 and 2209

dingqingy-nv requested a review from erhoo82 January 30, 2026 19:53

copy-pr-bot bot temporarily deployed to nemo-ci January 30, 2026 19:54 Inactive

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci January 30, 2026 21:26 Inactive

copy-pr-bot bot temporarily deployed to test January 30, 2026 21:27 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 30, 2026 21:39 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 30, 2026 21:45 Failure

ko3n1g previously approved these changes Jan 30, 2026

View reviewed changes

enable cudnn ln for dsv3 mxfp8 and update dsv3 recipe

3b799fc

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

dingqingy-nv dismissed ko3n1g’s stale review via 3b799fc February 3, 2026 02:10

dingqingy-nv force-pushed the dsv3-recipe-update branch from f4fb433 to 3b799fc Compare February 3, 2026 02:10

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 02:11 Inactive

copy-pr-bot bot temporarily deployed to test February 3, 2026 02:11 Inactive

erhoo82 approved these changes Feb 3, 2026

View reviewed changes

erhoo82 added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 3, 2026

erhoo82 added this to the 26.02 milestone Feb 3, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 03:13 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 03:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 03:30 Inactive

ko3n1g merged commit cc31be2 into NVIDIA-NeMo:main Feb 3, 2026
49 checks passed

ko3n1g pushed a commit that referenced this pull request Feb 3, 2026

Dsv3 Recipe Update (#2152)

f972a84

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

yaoyu-33 pushed a commit that referenced this pull request Feb 3, 2026

Dsv3 Recipe Update (#2152)

78f0e92

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

coderabbitai bot mentioned this pull request Feb 4, 2026

Update Deepseek V3 MXFP8 GB200 mapping #2215

Merged

ko3n1g added a commit that referenced this pull request Feb 7, 2026

Merge pull request #2271 from NVIDIA-NeMo/ko3n1g/fix/r030

861bbdd

Revert #2152 and 2209

This was referenced Feb 27, 2026

Tune kimi-k2 GB300 MXFP8 recipe #2590

Merged

NT Nano cfg update (#2662) #2681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dsv3 Recipe Update#2152

Dsv3 Recipe Update#2152
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3-recipe-update

dingqingy-nv commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dingqingy-nv commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dingqingy-nv commented Jan 30, 2026 •

edited by coderabbitai bot

Loading