Skip to content

DeepSeek-V3 recipes for H100#2197

Merged
scsudhakaran merged 1 commit intomainfrom
scsudhakaran/deepseekv3-h100
Feb 10, 2026
Merged

DeepSeek-V3 recipes for H100#2197
scsudhakaran merged 1 commit intomainfrom
scsudhakaran/deepseekv3-h100

Conversation

@scsudhakaran
Copy link
Copy Markdown
Contributor

@scsudhakaran scsudhakaran commented Feb 3, 2026

This PR updates the DeepSeek-V3 H100 recipes with a configuration that provides better performance numbers.

Summary by CodeRabbit

Release Notes

  • Configuration Updates

    • Updated DeepSeek V3 pretraining configuration parameters for H100 GPU environments with adjusted model parallelization strategy
    • Added pipeline layout parameters to optimize resource distribution across training nodes
    • Refined secondary pretraining configuration variant for improved performance characteristics
  • Performance Enhancements

    • Enabled expandable memory segment allocation for DeepSeek V3 pretraining on H100 hardware to optimize GPU memory utilization

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@scsudhakaran scsudhakaran added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 4, 2026
malay-nagda
malay-nagda previously approved these changes Feb 4, 2026
@malay-nagda malay-nagda self-requested a review February 4, 2026 11:52
@malay-nagda
Copy link
Copy Markdown
Contributor

@scsudhakaran can you remove the draft tag?
was this tested with v2 configs (GBS=16384)

@scsudhakaran scsudhakaran force-pushed the scsudhakaran/deepseekv3-h100 branch 2 times, most recently from e7667ec to 3bdd232 Compare February 10, 2026 11:57
@scsudhakaran scsudhakaran marked this pull request as ready for review February 10, 2026 13:09
malay-nagda
malay-nagda previously approved these changes Feb 10, 2026
@malay-nagda malay-nagda added performance performance/release Performance items related with NeMo release performance/optimize Performance optimization tracking labels Feb 10, 2026
@malay-nagda malay-nagda added this to the 26.02 milestone Feb 10, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

The changes modify DeepSeek V3 pretraining configurations for H100 GPUs by adjusting tensor and pipeline model parallelism settings, propagating pipeline layout configurations conditionally, and adding targeted environment variable configuration for specific hardware and model combinations.

Changes

Cohort / File(s) Summary
Pipeline Layout Configuration
scripts/performance/configs/deepseek/deepseek_llm_pretrain.py
Modified pipeline_model_parallel_layout assignment to use base_cfg.pp_layout when provided; otherwise recompute via set_deepseek_v3_pipeline_model_parallel_layout().
Workload Base Configurations
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Updated DEEPSEEK_V3_PRETRAIN_CONFIG_H100_V1 with tensor_model_parallel_size reduced from 4 to 2 and added pp_layout="Et
Environment Configuration
scripts/performance/perf_plugins.py
Added conditional branch to set PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" for deepseek v3 pretraining on h100 GPU.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #2186: Sets gb200 to use base_cfg.pp_layout in deepseek_llm_pretrain.py, parallel to base_cfg.pp_layout propagation in this PR.
  • #2076: Directly modifies deepseek LLM pretrain config to propagate base_cfg.pp_layout and introduces pp_layout field in workload configs.
  • #2060: Modifies model-specific environment variable gating in perf_plugins.py for different model cases.

Suggested labels

Run CICD

Suggested reviewers

  • malay-nagda
  • dingqingy-nv
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR aims to improve DeepSeek-V3 H100 performance through configuration changes but provides no performance test results, benchmarks, or before-and-after metrics. Add documented performance test results comparing old (TP=4) and new (TP=2) configurations, including throughput numbers, testing environment details, and convergence metrics.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'DeepSeek-V3 recipes for H100' directly and clearly describes the main change: adding/updating DeepSeek-V3 configuration recipes optimized for H100 hardware.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch scsudhakaran/deepseekv3-h100

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
scripts/performance/perf_plugins.py (1)

270-276: Minor style nit: missing space after elif.

Line 270 uses elif( while the preceding if ( on line 262 has a space before the parenthesis. This is inconsistent and may be flagged by ruff.

Proposed fix
-        elif(
+        elif (

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@scsudhakaran
Copy link
Copy Markdown
Contributor Author

/ok to test 3bdd232

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
@scsudhakaran
Copy link
Copy Markdown
Contributor Author

/ok to test 724e50c

@scsudhakaran scsudhakaran merged commit f36e5de into main Feb 10, 2026
87 of 90 checks passed
@scsudhakaran scsudhakaran deleted the scsudhakaran/deepseekv3-h100 branch February 10, 2026 15:38
sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: sowmen <sowmendipta@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance/optimize Performance optimization tracking performance/release Performance items related with NeMo release performance r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants