docs(DeepSeek-V4): add h200|big verified recipes + tune H200 Pro parameters by yushengsu-thu · Pull Request #23742 · sgl-project/sglang

yushengsu-thu · 2026-04-26T03:26:25Z

Summary

Mark h200|big|low-latency, h200|big|balanced, h200|big|max-throughput as verified in the interactive command generator.
Tune H200 Pro (big) parameters based on testing:
- SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: 256 → 128 (balanced & max-throughput)
- --cuda-graph-max-bs: 32 → 8, --max-running-requests: 64 → 32 (low-latency)
- --mem-fraction-static: 0.82 → 0.88 (low-latency / balanced / max-throughput)
- Balanced recipe: add dedicated --cuda-graph-max-bs 8 and --max-running-requests 32 for H200 Pro

Test plan

Run mint dev locally and verify the interactive command generator produces correct commands for all H200 Pro combinations
Verify no regressions for existing verified combos (H200 small, Blackwell, etc.)

Made with Cursor

gemini-code-assist · 2026-04-26T03:26:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Updates the DeepSeek-V4 deployment interactive command generator snippet to treat H200 Pro (big) recipes as verified and to tune several H200 Pro-specific launch parameters based on recent testing.

Changes:

Mark h200|big|low-latency, h200|big|balanced, and h200|big|max-throughput as verified (runnable) recipes.
Adjust H200 Pro (big) DeepEP dispatch-token cap for balanced/max-throughput and reduce low-latency cuda-graph/max-running-requests.
Increase H200 Pro (big) --mem-fraction-static to 0.88 in multiple recipes and add explicit balanced H200 Pro cuda-graph/max-running-requests flags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T03:28:43Z

+      if (hardware === "h200" && isBig) {
+        flags.push("  --cuda-graph-max-bs 8");
+        flags.push("  --max-running-requests 32");
+      } else if (hardware === "h200") {


The balanced recipe comment above says H200 big: cg=128 max-run=128 (same) but for hardware === "h200" && isBig the code now emits --cuda-graph-max-bs 8 and --max-running-requests 32. Update the comment to reflect the new H200 Pro tuned values to avoid confusing users (and to preserve the "mirror of allinone" intent documented earlier in the file).

Copilot · 2026-04-26T03:28:44Z

        flags.push("  --disable-flashinfer-autotune");
      }
-      if (isBig) flags.push("  --mem-fraction-static 0.82");
+      if (isBig) flags.push("  --mem-fraction-static 0.88");


--mem-fraction-static was changed to 0.88 for all isBig low-latency runs. The block comment above says Blackwell big should use mem-frac 0.82, and the PR description suggests this tuning is specific to H200 Pro. Consider gating 0.88 to hardware === "h200" && isBig and keeping the prior value for other big variants to avoid altering existing verified B200/GB300 commands.

Copilot · 2026-04-26T03:28:44Z

      if (hardware === "h200" && isBig) {
-        flags.push("  --cuda-graph-max-bs 32");
-        flags.push("  --max-running-requests 64");
+        flags.push("  --cuda-graph-max-bs 8");
+        flags.push("  --max-running-requests 32");
      }


The surrounding allinone summary comment for H200 big low-latency still mentions cg=32 max-run=64 (and mem-frac 0.82), but the actual flags now emit --cuda-graph-max-bs 8 and --max-running-requests 32 (and mem-frac 0.88). Please update the comment to match the new tuned values so the snippet remains self-consistent.

…meters (sgl-project#23742)

Copilot AI review requested due to automatic review settings April 26, 2026 03:26

yushengsu-thu requested a review from wisclmy0611 as a code owner April 26, 2026 03:26

github-actions Bot added the deepseek label Apr 26, 2026

Copilot started reviewing on behalf of yushengsu-thu April 26, 2026 03:26 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

yhyang201 mentioned this pull request Apr 26, 2026

docs(DeepSeek-V4): verify H200 Pro max-throughput recipe #23726

Closed

3 tasks

merge

38a8c73

yushengsu-thu force-pushed the cookbook branch from a3a08f4 to 38a8c73 Compare April 26, 2026 04:39

wisclmy0611 approved these changes Apr 26, 2026

View reviewed changes

wisclmy0611 merged commit 3cfd156 into sgl-project:main Apr 26, 2026
42 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

docs(DeepSeek-V4): add h200|big verified recipes + tune H200 Pro para…

1bf61a5

…meters (sgl-project#23742)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(DeepSeek-V4): add h200|big verified recipes + tune H200 Pro parameters#23742

docs(DeepSeek-V4): add h200|big verified recipes + tune H200 Pro parameters#23742
wisclmy0611 merged 1 commit intosgl-project:mainfrom
yushengsu-thu:cookbook

yushengsu-thu commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yushengsu-thu commented Apr 26, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants