Skip to content

Dsv3 Recipe Update#2152

Merged
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3-recipe-update
Feb 3, 2026
Merged

Dsv3 Recipe Update#2152
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3-recipe-update

Conversation

@dingqingy-nv
Copy link
Copy Markdown
Contributor

@dingqingy-nv dingqingy-nv commented Jan 30, 2026

What does this PR do ?

  • Use cudnn LN for better perf.
  • Update dsv3 gb300 mxfp8 recipe to use MBS2 no CG config as Hybrid EP WAR is added in Nemo container.

Summary by CodeRabbit

  • Chores
    • Updated DeepSeek v3 pretraining configurations for GB200 and GB300 hardware variants.
    • Streamlined configuration definitions and introduced new predefined aliases for GB300 variants.
    • Optimized cuDNN LayerNorm handling for Mixture-of-Experts models with specific compute precision types.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

These changes simplify and refactor DeepSeek V3 pretraining configurations for GB300 hardware profiles, consolidating redundant configurations into aliases and inheriting from base settings. Additionally, MoE-specific logic is introduced to preserve cuDNN LayerNorm for MXFP8 and BF16 compute types.

Changes

Cohort / File(s) Summary
Configuration Refactoring
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Simplified GB300_V1 configuration by removing hardware-specific fields (pipeline parallelism, expert parallelism, CUDA graph options) and introducing four new public aliases (BF16, FP8_CS, FP8_MX, NVFP4_V1) all pointing to the streamlined GB300_V1. Replaced GB300_NVFP4_V2 derived config with a direct alias to GB300_V2.
Layout Parameter Update
scripts/performance/configs/deepseek/deepseek_llm_pretrain.py
Updated layout parameter from explicit None to base_cfg.pp_layout in pretrain config assembly for GB200 DeepSeek v3.
MoE Compute Dtype Logic
scripts/performance/perf_plugins.py
Added conditional preservation of cuDNN LayerNorm for MoE models when using MXFP8 or BF16 compute types by checking model family and recipe names, preventing removal of cuDNN LayerNorm under these specific conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • DSV3 NVFP4 recipe on GB300 #2076 — Updates the same layout parameter in deepseek_llm_pretrain.py from None to base_cfg.pp_layout, indicating coordinated configuration changes.

Suggested reviewers

  • erhoo82
  • sanandaraj5597
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR claims performance improvements but lacks before-and-after benchmarks or test results documentation. Add performance benchmark data including before-and-after metrics, hardware configuration, batch size, and convergence test results.
Title check ❓ Inconclusive The title 'Dsv3 Recipe Update' is vague and generic, using non-descriptive terminology that does not convey the specific changes being made. Consider a more specific title such as 'Enable cuDNN LayerNorm for MoE models and update DSV3 mxfp8 recipe' to better communicate the main objectives.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 55-58: The four variant constants
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 and
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 should be created as copies of
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 using dataclasses.replace() rather than
direct assignment to avoid aliasing the same WorkloadBaseConfig instance; update
these assignments (and any similar variant assignments elsewhere, e.g., the ones
you noted around line 137) to call replace(DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
...) so each variant is its own dataclass instance (or alternatively make
WorkloadBaseConfig frozen=True if you prefer enforced immutability).

Comment on lines +55 to +58
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect WorkloadBaseConfig definition and cloning behavior
rg -n "class WorkloadBaseConfig" -C 3
rg -n "def get_workload_base_config" -C 3
rg -n "pp_layout|WorkloadBaseConfig\(" -C 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 9797


🏁 Script executed:

#!/bin/bash
# Get full WorkloadBaseConfig class definition
sed -n '29,78p' scripts/performance/utils/utils.py

# Check line 137 in deepseek file
sed -n '130,145p' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2661


🏁 Script executed:

#!/bin/bash
# Check imports in deepseek_workload_base_configs.py
head -20 scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

# Search for any mutations/assignments to these config variables
rg "DEEPSEEK_V3_PRETRAIN_CONFIG.*=" scripts/performance/configs/deepseek/deepseek_workload_base_configs.py -n

# Check if any code mutates config objects after assignment
rg "base_cfg\." scripts/performance/configs/deepseek/ -n | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 6886


🏁 Script executed:

#!/bin/bash
# Check imports for 'replace'
head -30 scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | grep -i "import\|from"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 145


Use replace() for all variant configs to avoid aliasing mutable dataclass instances.

These aliases (lines 55–58, and throughout the file including line 137) reference the same WorkloadBaseConfig instance. While the current usage is read-only, this pattern creates fragile coupling. The inconsistency—where V2 base configs use replace() but variants use direct assignment—should be unified. Consider applying replace() consistently for all variants, or mark WorkloadBaseConfig with frozen=True to enforce immutability.

🤖 Prompt for AI Agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 55 - 58, The four variant constants
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 and
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 should be created as copies of
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 using dataclasses.replace() rather than
direct assignment to avoid aliasing the same WorkloadBaseConfig instance; update
these assignments (and any similar variant assignments elsewhere, e.g., the ones
you noted around line 137) to call replace(DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
...) so each variant is its own dataclass instance (or alternatively make
WorkloadBaseConfig frozen=True if you prefer enforced immutability).

ko3n1g
ko3n1g previously approved these changes Jan 30, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
@ko3n1g ko3n1g merged commit cc31be2 into NVIDIA-NeMo:main Feb 3, 2026
49 checks passed
ko3n1g pushed a commit that referenced this pull request Feb 3, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
yaoyu-33 pushed a commit that referenced this pull request Feb 3, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
ko3n1g added a commit that referenced this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants