Skip to content

dsv3_gb300_revert- BF16 & FP8-MX scale#2277

Merged
ko3n1g merged 1 commit intomainfrom
malay/dsv3_gb300_revert_2602
Feb 9, 2026
Merged

dsv3_gb300_revert- BF16 & FP8-MX scale#2277
ko3n1g merged 1 commit intomainfrom
malay/dsv3_gb300_revert_2602

Conversation

@malay-nagda
Copy link
Copy Markdown
Contributor

@malay-nagda malay-nagda commented Feb 9, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Updated DeepSeek V3 pretraining configurations to optimize performance parameters across different hardware variants, including batch size adjustments and computational efficiency settings for large-scale model training workloads.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@malay-nagda malay-nagda added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 9, 2026
@malay-nagda malay-nagda marked this pull request as ready for review February 9, 2026 13:56
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Updates DeepSeek V3 workload base configuration variants by replacing simple aliases with explicit replace() calls that customize micro-batch sizing, pipeline parallelism, expert distribution, MOE dispatcher backend, CUDA graph parameters, and recompute modules for GB300 GPU cluster configurations.

Changes

Cohort / File(s) Summary
DeepSeek Configuration Variants
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Restructured DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 from alias to explicit config with micro_batch_size, pipeline/virtual pipeline sizes, expert parallelism, MOE dispatcher backend, CUDA graph settings, and recompute modules. Updated BF16_V2 to derive from BF16_V1 with global_batch_size override. Changed FP8_MX_LARGE_SCALE base from GB300_FP8_MX_V1 to GB300_BF16_V1.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

performance

Suggested reviewers

  • ko3n1g
  • dingqingy-nv
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'BF16 & FP8-MX scale' but lacks clarity about the actual change, which involves replacing configuration aliases with explicit replace() calls to customize multiple parameters. The term 'revert' is misleading as this is not reverting changes but making forward modifications. Use a clearer, more specific title such as 'Update DeepSeek V3 GB300 configurations to customize batch size, parallelism, and optimization settings' or similar, removing the ambiguous 'revert' terminology.
Test Results For Major Changes ⚠️ Warning PR contains major DeepSeek V3 parallelism and configuration changes but lacks test results, performance metrics, convergence verification, or before/after comparisons in description. Add comprehensive documentation including test results, before/after performance metrics, convergence verification, and address missing pp_layout override justification.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch malay/dsv3_gb300_revert_2602

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 55-66: DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 currently
inherits a pp_layout from DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 that assumes
(pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=8); update
the replace call for DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 to either set
pp_layout explicitly for the new (pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4) configuration or remove pp_layout from
the replace override so the framework can auto-compute the correct layout based
on the updated pipeline_model_parallel_size and
virtual_pipeline_model_parallel_size values.

Comment on lines +55 to +66
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = replace(
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
micro_batch_size=1,
pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4,
expert_model_parallel_size=64,
moe_flex_dispatcher_backend="hybridep",
moe_a2a_overlap=False,
cuda_graph_impl="transformer_engine",
cuda_graph_scope=["attn", "moe_router", "moe_preprocess"],
recompute_modules=["moe_act"],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, read the target file to understand the configuration structure
cd scripts/performance/configs/deepseek || exit 1
cat -n deepseek_workload_base_configs.py | head -80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3435


🏁 Script executed:

# Search for pp_layout usage across the codebase to understand how it's validated/used
rg -n -C 3 'pp_layout' --type=py | head -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4348


🏁 Script executed:

# Look for validation logic that checks compatibility between pp_layout and parallelism params
rg -n -C 5 'pipeline_model_parallel_size|virtual_pipeline_model_parallel_size' --type=py | grep -C 3 'pp_layout' | head -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3548


🏁 Script executed:

# Read more of the file to see other related configurations and understand the pattern
sed -n '55,160p' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 5120


🏁 Script executed:

# Check the set_deepseek_v3_pipeline_model_parallel_layout function to understand what it does
rg -n -A 20 'def set_deepseek_v3_pipeline_model_parallel_layout' --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2579


🏁 Script executed:

# Check if there are any tests or validation that would catch this mismatch
rg -n 'pp_layout.*Et\*4' --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 179


🏁 Script executed:

# Verify that (4, 4) is indeed the correct size for BF16_V1
grep -n 'pipeline_model_parallel_size=4' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 290


Override pp_layout in DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 for the new parallelism configuration.

BF16_V1 inherits pp_layout="Et*4|(t*4|)*14tmL" from GB300_V1, which is designed for pipeline_model_parallel_size=2 and virtual_pipeline_model_parallel_size=8. However, BF16_V1 changes both to 4 without overriding pp_layout. The configuration framework will use the inherited layout string instead of auto-computing the correct one for (4, 4), leading to incorrect layer distribution across pipeline stages at runtime.

Either explicitly set pp_layout to match the (4, 4) configuration or remove it entirely to trigger auto-computation based on the updated parallelism parameters.

🤖 Prompt for AI Agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 55 - 66, DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 currently
inherits a pp_layout from DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 that assumes
(pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=8); update
the replace call for DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 to either set
pp_layout explicitly for the new (pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4) configuration or remove pp_layout from
the replace override so the framework can auto-compute the correct layout based
on the updated pipeline_model_parallel_size and
virtual_pipeline_model_parallel_size values.

@ko3n1g ko3n1g merged commit 941c0b2 into main Feb 9, 2026
53 of 54 checks passed
@ko3n1g ko3n1g deleted the malay/dsv3_gb300_revert_2602 branch February 9, 2026 19:49
ko3n1g pushed a commit that referenced this pull request Feb 9, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
ko3n1g added a commit that referenced this pull request Feb 9, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: sowmen <sowmendipta@gmail.com>
@ko3n1g ko3n1g mentioned this pull request Feb 24, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants