Skip to content

docs(DeepSeek-V4): verify H200 Pro max-throughput recipe#23726

Closed
yhyang201 wants to merge 2 commits intosgl-project:mainfrom
yhyang201:dpskv4-h200-big-mt
Closed

docs(DeepSeek-V4): verify H200 Pro max-throughput recipe#23726
yhyang201 wants to merge 2 commits intosgl-project:mainfrom
yhyang201:dpskv4-h200-big-mt

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

Summary

  • Update H200 Pro (1.6T) max-throughput recipe parameters to match verified 2-node (16 GPU) deployment
  • DISPATCH_TOKENS: 256 → 128
  • --max-running-requests: 256 → 64
  • --mem-fraction-static: 0.82 → 0.875
  • Remove --cuda-graph-max-bs 128 (not needed for H200 big)
  • Mark h200|big|max-throughput as verified

Test plan

  • Verified on 2-node H200 cluster (Ion-5 + Ion-6, 16 GPU total)
  • Existing H200 small recipes unaffected (separate code path)
  • Other platform recipes unaffected

🤖 Generated with Claude Code

Update H200 big (Pro 1.6T) max-throughput parameters to match verified
2-node deployment:

- DISPATCH_TOKENS: 256 → 128
- --max-running-requests: 256 → 64
- --mem-fraction-static: 0.82 → 0.875
- Remove --cuda-graph-max-bs 128 (not needed)

Mark h200|big|max-throughput as verified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the 'h200|big|max-throughput' configuration to the DeepSeek-V4 deployment documentation, updating environment variables and CLI flags for H200 hardware. Specifically, it adjusts memory fraction and request limits for the 'big' variant. A review comment suggests consolidating the conditional logic for H200-specific overrides to enhance code readability and maintainability.

Comment on lines +365 to 374
if (isBig && hardware === "h200") {
flags.push(" --mem-fraction-static 0.875");
} else if (isBig) {
flags.push(" --mem-fraction-static 0.82");
}
if (hardware === "h200" && isBig) {
flags.push(" --max-running-requests 64");
} else if (hardware === "h200") {
flags.push(" --cuda-graph-max-bs 128");
flags.push(" --max-running-requests 256");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for adding flags in the max-throughput recipe is slightly fragmented across multiple if blocks. While functional, consolidating the hardware === "h200" && isBig check would improve readability and maintainability, especially as more hardware-specific overrides are added.

      if (hardware === "h200" && isBig) {
        flags.push("  --mem-fraction-static 0.875");
        flags.push("  --max-running-requests 64");
      } else {
        if (isBig) flags.push("  --mem-fraction-static 0.82");
        if (hardware === "h200") {
          flags.push("  --cuda-graph-max-bs 128");
          flags.push("  --max-running-requests 256");
        } else if (isBig && hardware === "b200") {
          flags.push("  --cuda-graph-max-bs 64");
          flags.push("  --max-running-requests 256");
        } else if (isBig && hardware === "gb300") {
          flags.push("  --cuda-graph-max-bs 128");
          flags.push("  --max-running-requests 256");
        }
      }

Add commented-out hints for machine-specific env vars (NVSHMEM, GLOO,
NCCL) on H200 big (2-node) deployments, matching the GB200 pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201
Copy link
Copy Markdown
Collaborator Author

yhyang201 commented Apr 26, 2026

Move to #23742

@yhyang201 yhyang201 closed this Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant