Skip to content

DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB300 FP8-CS#2175

Merged
ko3n1g merged 3 commits intomainfrom
malay/b200_dsv3_ep8
Feb 3, 2026
Merged

DSv3 EP=8 for B200, PP8-VP2 for B300 BF16, Lm3.1 405B TP4-CP1 GB300 FP8-CS#2175
ko3n1g merged 3 commits intomainfrom
malay/b200_dsv3_ep8

Conversation

@malay-nagda
Copy link
Copy Markdown
Contributor

@malay-nagda malay-nagda commented Feb 2, 2026

What does this PR do ?

  • Change EP=16 to EP=8 for B200 (both BF16 and FP8)
  • Change PP16-VP1 to PP8-VP2 for B300 BF16
  • Change TP2-CP2 to TP4-CP1 for Llama3.1 405B GB300 FP8-CS

Changelog

BASE_DEEPSEEK_V3_CONFIG,
    ...
    - expert_model_parallel_size=16,
    + expert_model_parallel_size=8,
DEEPSEEK_V3_PRETRAIN_CONFIG_B300_BF16_V2 = replace(
    DEEPSEEK_V3_PRETRAIN_CONFIG_B300_V2,
    pipeline_model_parallel_size=8,
    virtual_pipeline_model_parallel_size=2,
)
LLAMA31_405B_PRETRAIN_CONFIG_GB300_FP8_CS_V2 = replace(
    LLAMA31_405B_PRETRAIN_CONFIG_GB300_FP8_CS_V1,
    tensor_model_parallel_size=4,
    pipeline_model_parallel_size=8,
    context_parallel_size=1,

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Adjusted performance workload configuration parameters for optimization.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 2, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

The expert_model_parallel_size parameter in the DEEPSEEK_V3_PRETRAIN_CONFIG_B200_V1 configuration is reduced from 16 to 8. No other configuration values, control flow, or logic are affected by this change.

Changes

Cohort / File(s) Summary
Configuration Parameter Update
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Reduced expert_model_parallel_size from 16 to 8 in DEEPSEEK_V3_PRETRAIN_CONFIG_B200_V1.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR changes DeepSeek V3 expert_model_parallel_size from 16 to 8 on B200 without required performance benchmarks, convergence data, or testing confirmation per CONTRIBUTING.md guidelines. Add performance metrics (throughput, GPU utilization, memory efficiency), convergence validation, explanation of EP=8 optimization for B200, and testing confirmation.
Title check ❓ Inconclusive The title is highly technical and contains multiple unrelated configuration changes, making it difficult to identify the primary change. While it mentions 'DSv3 EP=8 for B200' which relates to the code change, the title also includes unrelated items (PP8-VP2 for B300, Lm3.1 405B, etc.) that are not reflected in the actual changeset. Simplify the title to focus on the primary change: 'Change DeepSeek V3 expert parallelism to 8 for B200' or similar. Remove unrelated configuration items not present in this PR's changeset.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch malay/b200_dsv3_ep8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants