Skip to content

(radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks#1297

Merged
functionstackx merged 30 commits intomainfrom
sglang-disagg-gb300-mtp-0507
May 11, 2026
Merged

(radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks#1297
functionstackx merged 30 commits intomainfrom
sglang-disagg-gb300-mtp-0507

Conversation

@ch-wan
Copy link
Copy Markdown
Collaborator

@ch-wan ch-wan commented May 7, 2026

Summary

  • Adds 6 disagg MTP recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/ (low-latency 1p1d-tp4 / 1p6d-dep4-tp4 + mid-curve dep4-dep8/dep16 with 1p, 2p, 4p prefill)
  • Wires them into dsv4-fp4-gb300-dynamo-sglang-mtp in .github/configs/nvidia-master.yaml, each entry carrying spec-decoding: "mtp" and the corresponding topology
  • Recipes adapted from elvischenv/srt-slurm@dsv4-gb300-disagg-8k1k-mtp, repointed at the public lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev container and the deepseek-v4-pro model alias

Test plan

  • /sweep on this PR — verify the matrix dispatches the 6 new MTP entries
  • Confirm the dsv4-fp4-gb300-dynamo-sglang-mtp rows appear in the sweep matrix listing
  • Eval-only entry (max-conc) produces lm-eval scores

🤖 Generated with Claude Code

@ch-wan ch-wan requested a review from a team May 7, 2026 18:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

@ch-wan ch-wan changed the title Add DeepSeek V4 Pro FP4 GB300 disaggregated SGLang MTP benchmarks Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Comment thread perf-changelog.yaml Outdated
model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev"
precision: "mxfp4"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 All 6 new MTP recipes set model.precision: "mxfp4", but every existing sibling dsv4 SGLang recipe in benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/ uses precision: "fp4" — even though they share the same moe-runner-backend: flashinfer_mxfp4 — and the matrix entry dsv4-fp4-gb300-dynamo-sglang-mtp itself has precision: fp4. Nit: align all 6 MTP recipes to precision: "fp4" to match the established convention; this is metadata-only (InferenceX aggregation keys off the matrix-level precision, not the recipe yaml), so runtime impact is minimal.

Extended reasoning...

What the inconsistency is

Each of the 6 new files at benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/*.yaml has:

model:
  path: "deepseek-v4-pro"
  container: "lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev"
  precision: "mxfp4"

Whereas all 6 pre-existing sibling recipes at benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-*.yaml use precision: "fp4" (line 37 of each), despite carrying the same moe-runner-backend: "flashinfer_mxfp4" setting in their sglang_config. The matrix entry added in .github/configs/nvidia-master.yaml for these MTP recipes also uses precision: fp4, and AGENTS.md lists only fp4 and fp8 as recognized precisions in the project.

Step-by-step proof of the divergence

  1. Open benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/mtp/disagg-low-latency-1p1d-tp4-tp4.yaml line 15: precision: "mxfp4".
  2. Open benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-1p1d-dep4-dep8-3-c256.yaml (or any of the 6 sibling recipes added in Update DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1295) around line 37: precision: "fp4".
  3. Both files set moe-runner-backend: "flashinfer_mxfp4" in their sglang_config.decode blocks.
  4. Open .github/configs/nvidia-master.yaml at the new dsv4-fp4-gb300-dynamo-sglang-mtp: block: precision: fp4.

So within the same PR, the matrix says fp4 and the recipe yamls say mxfp4, while the equivalent non-MTP sibling recipes that share the same MoE backend say fp4 at the recipe level too. That is a copy-paste inconsistency with the established convention.

Addressing the refutation: what the runtime impact actually is

The refutation correctly notes that InferenceX's own aggregation pipelines (utils/summarize.py, utils/collect_eval_results.py, utils/matrix_logic/generate_sweep_configs.py, launch_gb300-cw.sh) key off the matrix-level precision field from nvidia-master.yaml, not the recipe yaml's model.precision. Since the matrix entry is correctly fp4, in-repo aggregation/labeling is unaffected — the original framing of "confusing labels in eval/result aggregation pipelines" overstates the impact. The recipe-level field is consumed externally by srt-slurm/srtctl, and the upstream source (elvischenv/srt-slurm@dsv4-gb300-disagg-8k1k-mtp) presumably accepts mxfp4. So this is not a runtime breakage.

Why it's still worth fixing

It is purely a cross-recipe metadata uniformity nit: every sibling dsv4 SGLang recipe in the same directory tree, even ones using the identical flashinfer_mxfp4 MoE backend, declares precision: "fp4" at the recipe level. The mxfp4 label here will trip up future grep-based audits and contradicts the project-wide enum in AGENTS.md. The fix is to replace precision: "mxfp4" with precision: "fp4" on line 15 of all 6 new MTP recipes — no other change required.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from ea35b7b to ce53cf1 Compare May 8, 2026 06:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan
Copy link
Copy Markdown
Collaborator Author

ch-wan commented May 8, 2026

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25540780423
Command: ``
Pinned ref: dba5e0d
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan
Copy link
Copy Markdown
Collaborator Author

ch-wan commented May 8, 2026

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25541720592
Command: ``
Pinned ref: 5e30f2c
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan
Copy link
Copy Markdown
Collaborator Author

ch-wan commented May 8, 2026

/sweep

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25542023314
Command: ``
Pinned ref: ff0df99
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

@ch-wan ch-wan closed this May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

@ch-wan ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from b8dfc19 to 8ecde43 Compare May 9, 2026 20:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

The mooncake backend has a KV-transfer bug that produces wrong gsm8k
answers when prompts end on the `<think>` token (id 128821).
Empirically: same input on monolithic sglang gives correct answer,
mooncake-disagg gives wrong, nixl-disagg gives correct. Bug filed
upstream; using nixl as workaround.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ch-wan ch-wan force-pushed the sglang-disagg-gb300-mtp-0507 branch from 8ecde43 to 3275282 Compare May 9, 2026 20:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

@github-actions
Copy link
Copy Markdown
Contributor

ch-wan and others added 2 commits May 10, 2026 01:51
Picks up sgl-project/sglang#24878 (merged as c7f674e4),
which adds the missing dsv4 state_type branch to
MooncakeKVManager.maybe_send_extra. Combined with the prior
revert of #1297's nixl switch (commit daa6785), the mooncake
backend now correctly transfers DSv4's flat heterogeneous
state pool for both non-MTP and MTP runs.

Validated on GB300 1P+1D: comp_with_think.json (the prompt
ending on the literal `<think>` token that previously surfaced
the corruption) now returns the correct gsm8k Janet answer
(`#### 18`) on mooncake disagg, matching mono and the NIXL
control. MTP sa-bench delivers ~136 tok/s output throughput
(~1.7x non-MTP), confirming draft acceptance is working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

ch-wan and others added 2 commits May 10, 2026 02:23
NVIDIA/srt-slurm#144 (``sa-bench: make SGLangDeepseekV4Tokenizer
callable``) merged as 0cbc7eb4. Drop the ch-wan/srt-slurm fork pin
that was only there while #144 was in review and pin to the upstream
merge commit instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Now that #144 is merged, no longer need to pin a specific commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Picks up sgl-project/sglang main commit 2473659e (built via
upstream workflow run 25639473178).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx changed the title Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks (radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks May 11, 2026
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@functionstackx functionstackx merged commit 4007906 into main May 11, 2026
26 checks passed
@functionstackx functionstackx deleted the sglang-disagg-gb300-mtp-0507 branch May 11, 2026 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

5 participants