Skip to content

cp: DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB300 FP8-CS (2175) into r0.3.0#2198

Merged
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2175-r0.3.0
Feb 3, 2026
Merged

cp: DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB300 FP8-CS (2175) into r0.3.0#2198
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2175-r0.3.0

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 3, 2026

beep boop [🤖]: Hi @malay-nagda 👋,

we've cherry picked #2175 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • Chores
    • Optimized DeepSeek V3 pretraining configuration for GB300 hardware by adjusting expert and pipeline parallelism parameters.
    • Updated LLaMA 3.1 pretraining configuration to refine tensor and context parallelism settings for improved training efficiency.

…P8-CS (#2175)

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented Feb 3, 2026

/ok to test 4937436

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Updates configuration parameters for DeepSeek and Llama model pretraining setups. Reduces expert-model-parallelism for DeepSeek GB300 V1, restructures DeepSeek B300 BF16 V2 definition, and adjusts tensor and context parallelism for Llama31 405B FP8 GB300 configuration.

Changes

Cohort / File(s) Summary
DeepSeek V3 Pretraining Configs
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Reduces expert_model_parallel_size from 16 to 8 in DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1. Restructures DEEPSEEK_V3_PRETRAIN_CONFIG_B300_BF16_V2 from alias to explicit replace() with pipeline_model_parallel_size=8 and virtual_pipeline_model_parallel_size=2.
Llama31 Pretraining Configs
scripts/performance/configs/llama/llama31_workload_base_configs.py
Updates LLAMA31_405B_PRETRAIN_CONFIG_GB300_FP8_CS_V2 to increase tensor_model_parallel_size from 2 to 4, reduce context_parallel_size from 2 to 1, and explicitly set use_megatron_fsdp=False and cpu_offloading_num_layers=None.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • erhoo82
  • sanandaraj5597
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description lacks test results, performance benchmarks, convergence validation, or references to original PR #2175 testing for critical parallelism parameter changes affecting training performance. Update PR description with reference to PR #2175 performance testing, before-and-after metrics for parallelism changes on B200/B300/GB300 hardware, and convergence validation confirmation.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title specifically describes the three main changes: DeepSeek V3 expert-model-parallelism reduction to 8, DeepSeek V3 B300 BF16 pipeline/virtual-pipeline configuration, and Llama 3.1 405B tensor-context parallelism changes, all aligned with the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2175-r0.3.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g merged commit 48a27fa into r0.3.0 Feb 3, 2026
45 of 47 checks passed
@ko3n1g ko3n1g deleted the cherry-pick-2175-r0.3.0 branch February 3, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants