Skip to content

Fix DeepSeek-V3 H100 large scale config#2401

Merged
scsudhakaran merged 1 commit intomainfrom
scsudhakaran/dsv3
Feb 23, 2026
Merged

Fix DeepSeek-V3 H100 large scale config#2401
scsudhakaran merged 1 commit intomainfrom
scsudhakaran/dsv3

Conversation

@scsudhakaran
Copy link
Copy Markdown
Contributor

@scsudhakaran scsudhakaran commented Feb 17, 2026

Summary by CodeRabbit

  • Chores
    • Updated backend configuration parameters to optimize performance handling for large-scale model operations.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@scsudhakaran
Copy link
Copy Markdown
Contributor Author

/ok to test 1a8e1d0

@scsudhakaran scsudhakaran added this to the 26.02 milestone Feb 17, 2026
@scsudhakaran scsudhakaran added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 17, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR modifies the DeepSeek V3 H100 FP8 SC Large Scale pretrain configuration by adding two parameters: virtual_pipeline_model_parallel_size set to 2 and pp_layout set to None. These parameters configure the virtual pipeline model parallel degree and pipeline layout strategy for the large-scale H100 FP8 SC variant.

Changes

Cohort / File(s) Summary
DeepSeek Performance Configuration
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Added virtual_pipeline_model_parallel_size=2 and pp_layout=None parameters to the DeepSeek V3 H100 FP8 SC Large Scale pretrain configuration.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • DeepSeek-V3 recipes for H100 #2312 — Modifies the same DeepSeek H100 pretrain base configs file by adding/setting identical virtual pipeline parallelism and pipeline layout parameters.

Suggested labels

performance

Suggested reviewers

  • ko3n1g
  • erhoo82
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR claims to fix DeepSeek-V3 H100 timeout issue, but known-issues.md created in same commit still lists it as active problem with no evidence provided that fix resolves the timeout. Add test results proving H100 training succeeds without timeout, update known-issues.md to reflect fix status, document why parameter changes resolve timeout, and provide convergence/performance validation.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Title check ✅ Passed The title 'Fix DeepSeek-V3 H100 large scale config' accurately summarizes the main change—adding virtual_pipeline_model_parallel_size and pp_layout parameters to the large-scale H100 FP8 SC config.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch scsudhakaran/dsv3

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants