(radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks by ch-wan · Pull Request #1297 · SemiAnalysisAI/InferenceX

ch-wan · 2026-05-07T18:02:03Z

Summary

Adds 6 disagg MTP recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/ (low-latency 1p1d-tp4 / 1p6d-dep4-tp4 + mid-curve dep4-dep8/dep16 with 1p, 2p, 4p prefill)
Wires them into dsv4-fp4-gb300-dynamo-sglang-mtp in .github/configs/nvidia-master.yaml, each entry carrying spec-decoding: "mtp" and the corresponding topology
Recipes adapted from elvischenv/srt-slurm@dsv4-gb300-disagg-8k1k-mtp, repointed at the public lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev container and the deepseek-v4-pro model alias

Test plan

/sweep on this PR — verify the matrix dispatches the 6 new MTP entries
Confirm the dsv4-fp4-gb300-dynamo-sglang-mtp rows appear in the sweep matrix listing
Eval-only entry (max-conc) produces lm-eval scores

🤖 Generated with Claude Code

github-actions · 2026-05-07T18:02:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-07T18:04:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25513269866
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25513269866

github-actions · 2026-05-07T18:07:12Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25513378863
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25513378863

claude · 2026-05-07T18:13:30Z

+model:
+  path: "deepseek-v4-pro"
+  container: "lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev"
+  precision: "mxfp4"


🟡 All 6 new MTP recipes set model.precision: "mxfp4", but every existing sibling dsv4 SGLang recipe in benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/ uses precision: "fp4" — even though they share the same moe-runner-backend: flashinfer_mxfp4 — and the matrix entry dsv4-fp4-gb300-dynamo-sglang-mtp itself has precision: fp4. Nit: align all 6 MTP recipes to precision: "fp4" to match the established convention; this is metadata-only (InferenceX aggregation keys off the matrix-level precision, not the recipe yaml), so runtime impact is minimal.

Extended reasoning...

What the inconsistency is

Each of the 6 new files at benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/*.yaml has:

model: path: "deepseek-v4-pro" container: "lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev" precision: "mxfp4"

Whereas all 6 pre-existing sibling recipes at benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-*.yaml use precision: "fp4" (line 37 of each), despite carrying the same moe-runner-backend: "flashinfer_mxfp4" setting in their sglang_config. The matrix entry added in .github/configs/nvidia-master.yaml for these MTP recipes also uses precision: fp4, and AGENTS.md lists only fp4 and fp8 as recognized precisions in the project.

Step-by-step proof of the divergence

Open benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/disagg-low-latency-1p1d-tp4-tp4.yaml line 15: precision: "mxfp4".

Open benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-1p1d-dep4-dep8-3-c256.yaml (or any of the 6 sibling recipes added in Update DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1295) around line 37: precision: "fp4".

Both files set moe-runner-backend: "flashinfer_mxfp4" in their sglang_config.decode blocks.

Open .github/configs/nvidia-master.yaml at the new dsv4-fp4-gb300-dynamo-sglang-mtp: block: precision: fp4.

So within the same PR, the matrix says fp4 and the recipe yamls say mxfp4, while the equivalent non-MTP sibling recipes that share the same MoE backend say fp4 at the recipe level too. That is a copy-paste inconsistency with the established convention.

Addressing the refutation: what the runtime impact actually is

The refutation correctly notes that InferenceX's own aggregation pipelines (utils/summarize.py, utils/collect_eval_results.py, utils/matrix_logic/generate_sweep_configs.py, launch_gb300-cw.sh) key off the matrix-level precision field from nvidia-master.yaml, not the recipe yaml's model.precision. Since the matrix entry is correctly fp4, in-repo aggregation/labeling is unaffected — the original framing of "confusing labels in eval/result aggregation pipelines" overstates the impact. The recipe-level field is consumed externally by srt-slurm/srtctl, and the upstream source (elvischenv/srt-slurm@dsv4-gb300-disagg-8k1k-mtp) presumably accepts mxfp4. So this is not a runtime breakage.

Why it's still worth fixing

It is purely a cross-recipe metadata uniformity nit: every sibling dsv4 SGLang recipe in the same directory tree, even ones using the identical flashinfer_mxfp4 MoE backend, declares precision: "fp4" at the recipe level. The mxfp4 label here will trip up future grep-based audits and contradicts the project-wide enum in AGENTS.md. The fix is to replace precision: "mxfp4" with precision: "fp4" on line 15 of all 6 new MTP recipes — no other change required.

github-actions · 2026-05-07T21:59:09Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25513378863
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25513378863

github-actions · 2026-05-08T05:44:40Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25513378863
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25513378863

github-actions · 2026-05-08T06:05:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25539877826
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25539877826

github-actions · 2026-05-08T06:06:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25539890483
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25539890483

github-actions · 2026-05-08T06:21:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25539917178
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25539917178

github-actions · 2026-05-08T06:26:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25540254945
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25540254945

ch-wan · 2026-05-08T06:29:57Z

/sweep

github-actions · 2026-05-08T06:30:13Z

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25540780423
Command: ``
Pinned ref: dba5e0d
Approval: not required (trusted collaborator).

github-actions · 2026-05-08T06:37:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25540254945
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25540254945

ch-wan · 2026-05-08T06:54:36Z

/sweep

github-actions · 2026-05-08T06:54:51Z

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25541720592
Command: ``
Pinned ref: 5e30f2c
Approval: not required (trusted collaborator).

github-actions · 2026-05-08T06:56:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25541004960
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25541004960

ch-wan · 2026-05-08T07:02:33Z

/sweep

github-actions · 2026-05-08T07:02:49Z

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25542023314
Command: ``
Pinned ref: ff0df99
Approval: not required (trusted collaborator).

github-actions · 2026-05-08T07:04:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25541718378
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25541718378

github-actions · 2026-05-08T07:10:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25542020807
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25542020807

github-actions · 2026-05-08T07:16:14Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25542257974
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25542257974

github-actions · 2026-05-09T04:57:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25588895365
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25588895365

github-actions · 2026-05-09T09:59:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25597755290
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25597755290

github-actions · 2026-05-09T20:39:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25611270471
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25611270471

The mooncake backend has a KV-transfer bug that produces wrong gsm8k answers when prompts end on the `<think>` token (id 128821). Empirically: same input on monolithic sglang gives correct answer, mooncake-disagg gives wrong, nixl-disagg gives correct. Bug filed upstream; using nixl as workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-09T20:53:17Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25611543546
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25611543546

…tp-0507

github-actions · 2026-05-10T06:21:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25611621691
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25611621691

This reverts commit 3275282.

Picks up sgl-project/sglang#24878 (merged as c7f674e4), which adds the missing dsv4 state_type branch to MooncakeKVManager.maybe_send_extra. Combined with the prior revert of #1297's nixl switch (commit daa6785), the mooncake backend now correctly transfers DSv4's flat heterogeneous state pool for both non-MTP and MTP runs. Validated on GB300 1P+1D: comp_with_think.json (the prompt ending on the literal `<think>` token that previously surfaced the corruption) now returns the correct gsm8k Janet answer (`#### 18`) on mooncake disagg, matching mono and the NIXL control. MTP sa-bench delivers ~136 tok/s output throughput (~1.7x non-MTP), confirming draft acceptance is working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-10T09:22:43Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25625092143
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25625092143

…tp-0507

NVIDIA/srt-slurm#144 (``sa-bench: make SGLangDeepseekV4Tokenizer callable``) merged as 0cbc7eb4. Drop the ch-wan/srt-slurm fork pin that was only there while #144 was in review and pin to the upstream merge commit instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-10T09:27:06Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25625112367
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25625112367

Now that #144 is merged, no longer need to pin a specific commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-10T09:27:51Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25625141831
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25625141831

github-actions · 2026-05-10T09:30:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25625188140
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25625188140

github-actions · 2026-05-10T11:02:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25625188140
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25625188140

Picks up sgl-project/sglang main commit 2473659e (built via upstream workflow run 25639473178). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-10T22:07:42Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25641065811
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25641065811

…tp-0507

github-actions · 2026-05-10T23:17:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25641424267
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25641424267

github-actions · 2026-05-10T23:41:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25641424267
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25641424267

github-actions · 2026-05-11T01:04:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25642986639
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25642986639

functionstackx

lgtm

ch-wan requested a review from a team May 7, 2026 18:02

github-project-automation Bot added this to InferenceMAX Board May 7, 2026

ch-wan requested review from jgangani and kedarpotdar-nv as code owners May 7, 2026 18:02

ch-wan added the full-sweep-enabled label May 7, 2026

ch-wan changed the title ~~Add DeepSeek V4 Pro FP4 GB300 disaggregated SGLang MTP benchmarks~~ Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks May 7, 2026

claude Bot reviewed May 7, 2026

View reviewed changes

add mtp configs

ce53cf1

ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from ea35b7b to ce53cf1 Compare May 8, 2026 06:06

ch-wan closed this May 8, 2026

ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from b8dfc19 to 8ecde43 Compare May 9, 2026 20:39

ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from 8ecde43 to 3275282 Compare May 9, 2026 20:52

Merge remote-tracking branch 'origin/main' into sglang-disagg-gb300-m…

1ffcab9

…tp-0507

ch-wan and others added 2 commits May 10, 2026 01:51

Revert "Switch DSV4 MTP recipes to nixl KV transfer backend"

daa6785

This reverts commit 3275282.

ch-wan and others added 2 commits May 10, 2026 02:23

Merge remote-tracking branch 'origin/main' into sglang-disagg-gb300-m…

eae8d32

…tp-0507

gb300-cw: track NVIDIA/srt-slurm main instead of pinning a commit

cb45485

Now that #144 is merged, no longer need to pin a specific commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bump MTP recipes to sglang nightly 20260510-2473659e

79d2cb6

Picks up sgl-project/sglang main commit 2473659e (built via upstream workflow run 25639473178). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into sglang-disagg-gb300-m…

32e623d

…tp-0507

fix: use shared gb300 dsv4 model path

35a2f9a

functionstackx changed the title ~~Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks~~ (radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks May 11, 2026

functionstackx approved these changes May 11, 2026

View reviewed changes

functionstackx merged commit 4007906 into main May 11, 2026
26 checks passed

functionstackx deleted the sglang-disagg-gb300-mtp-0507 branch May 11, 2026 03:30

Conversation

ch-wan commented May 7, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

claude Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

ch-wan commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

ch-wan commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

ch-wan commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment