Skip to content

[NVIDIA] chore: B200 single node DeepSeek v4 SGLang MTP#1145

Closed
cquil11 wants to merge 18 commits into
mainfrom
chore/dsv4-sgl-mtp-b200
Closed

[NVIDIA] chore: B200 single node DeepSeek v4 SGLang MTP#1145
cquil11 wants to merge 18 commits into
mainfrom
chore/dsv4-sgl-mtp-b200

Add dsv4_fp4_b200_mtp.sh for spec-decoding benchmarks

1dd4db6
Select commit
Loading
Failed to load commit list.
Claude / Claude Code Review completed Apr 24, 2026 in 10m 42s

Code review found 3 important issues

Found 6 candidates, confirmed 5. See review comments for details.

Details

Severity Count
🔴 Important 3
🟡 Nit 2
🟣 Pre-existing 0
Severity File:Line Issue
🔴 Important perf-changelog.yaml:1-10 perf-changelog description contradicts the shipped yaml and scripts
🔴 Important benchmarks/single_node/dsv4_fp4_b200_mtp.sh:53-66 EVAL_CONTEXT_ARGS computed but never passed to sglang serve
🔴 Important benchmarks/single_node/dsv4_fp4_b200.sh:58-61 Non-MTP dsv4_fp4_b200.sh enables EAGLE spec decoding
🟡 Nit benchmarks/single_node/dsv4_fp4_b200_mtp.sh:56 Hard-coded --tp 8 ignores $TP env var
🟡 Nit benchmarks/single_node/dsv4_fp4_b200_mtp.sh:58 --speculative-algo flag breaks convention (should be --speculative-algorithm)

Annotations

Check failure on line 10 in perf-changelog.yaml

See this annotation in the file changed.

@claude claude / Claude Code Review

perf-changelog description contradicts the shipped yaml and scripts

The new perf-changelog.yaml entry for dsv4-fp4-b200-sglang misrepresents what is actually shipped in this PR: it claims "TP8, EP8, dp-attention" and "Prefix caching and speculative decoding disabled", but nvidia-master.yaml has ep:1 (not 8), neither dsv4_fp4_b200.sh nor dsv4_fp4_b200_mtp.sh passes --enable-dp-attention/--dp-size, both scripts enable EAGLE speculative decoding (--speculative-num-steps 3, --speculative-eagle-topk 1, --speculative-num-draft-tokens 4), and neither script disables pr

Check failure on line 66 in benchmarks/single_node/dsv4_fp4_b200_mtp.sh

See this annotation in the file changed.

@claude claude / Claude Code Review

EVAL_CONTEXT_ARGS computed but never passed to sglang serve

EVAL_CONTEXT_ARGS is populated to '--context-length $EVAL_MAX_MODEL_LEN' inside the EVAL_ONLY branch (lines 40-44) but never expanded into the `sglang serve` invocation (lines 49-66), so the server boots with the default model context. When this benchmark runs in EVAL_ONLY mode (e.g. via the multi-node lm-eval flow added in #1000/#1094/#1120), long-context evals will be silently truncated or fail. Append `$EVAL_CONTEXT_ARGS` to the sglang serve command (right before `> $SERVER_LOG`); the same fi

Check failure on line 61 in benchmarks/single_node/dsv4_fp4_b200.sh

See this annotation in the file changed.

@claude claude / Claude Code Review

Non-MTP dsv4_fp4_b200.sh enables EAGLE spec decoding

The non-MTP `benchmarks/single_node/dsv4_fp4_b200.sh` enables EAGLE speculative decoding (`--speculative-algo EAGLE`, `--speculative-num-steps 3`, `--speculative-eagle-topk 1`, `--speculative-num-draft-tokens 4`), contradicting both its name and the convention used by every other non-MTP variant in this directory. Drop the four `--speculative-*` flags from the non-MTP script so the runner picks up genuinely-disabled spec decoding when a non-MTP YAML entry is added later (matches dsr1_fp4_b200.sh

Check warning on line 56 in benchmarks/single_node/dsv4_fp4_b200_mtp.sh

See this annotation in the file changed.

@claude claude / Claude Code Review

Hard-coded --tp 8 ignores $TP env var

Hard-coded `--tp 8` on line 56 ignores the `$TP` env var that `check_env_vars` enforces and that the script echoes for logging on line 38; same issue at `dsv4_fp4_b200.sh:56`. Latent today (yaml only has tp:8 entries) but inconsistent with every sibling `*_b200*.sh` script in this directory which all parameterize via `--tensor-parallel-size=$TP` or `--tp "$TP"` — worth fixing now while the script is being added. Trivial fix: change `--tp 8` to `--tp "$TP"`.

Check warning on line 58 in benchmarks/single_node/dsv4_fp4_b200_mtp.sh

See this annotation in the file changed.

@claude claude / Claude Code Review

--speculative-algo flag breaks convention (should be --speculative-algorithm)

Both new dsv4 scripts pass `--speculative-algo EAGLE` (dsv4_fp4_b200.sh:58 and dsv4_fp4_b200_mtp.sh:58), while all 17 sibling MTP scripts in benchmarks/single_node/ (dsr1_*_mtp.sh, glm5_*_mtp.sh, qwen3.5_*_mtp.sh) use the canonical `--speculative-algorithm`. The abbreviated form likely works today via argparse prefix-matching, but it's a clear convention break and is fragile (would silently break if sglang ever adds another `--speculative-algo*` flag, making the prefix ambiguous, or switches `sg