Update DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks by ch-wan · Pull Request #1295 · SemiAnalysisAI/InferenceX

ch-wan · 2026-05-07T00:42:09Z

Summary

Overhaul the DeepSeek-V4-Pro FP4 GB300 disaggregated SGLang benchmark configurations to use WideEP TP=16 decode topology across most concurrency points, scale up concurrency targets, switch to the upstream NVIDIA/srt-slurm main branch, and re-enable lm-eval scoring.

Key changes

Search-space overhaul (nvidia-master.yaml)

Switch decode workers from mixed TP=8/EP=8 and TP=16/EP=16 to WideEP TP=16/EP=16 across all high-concurrency points (TP=12/EP=12 at the 21504 max-concurrency point)
Scale concurrency targets up significantly:
- 1p1d: 512 → 1024 (5 nodes)
- 4p1d: 512 → 1024 (8 nodes, was 1p1d with TP=8 decode)
- 8p1d: new at conc 4096 (12 nodes)
- 10p1d: 2048 → 8192 (14 nodes)
- 12p1d: 16384 → 21504 (15 nodes, decode TP=12)
- 1p1d TP=4 baseline at conc=1 retained
Update image from lmsysorg/sglang:deepseek-v4-grace-blackwell to lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev

Recipe files (benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/)

Rename all per-concurrency recipe files from conc{N}.yaml to descriptive disagg-gb300-{P}p1d-{prefill_topo}-{decode_topo}-{nodes}-c{conc}.yaml naming convention
Delete the now-unused conc1024.yaml
Add new disagg-gb300-8p1d-dep4-dep16-12-c4096.yaml for the new 8p1d topology
Update TP/EP/resource counts in all recipes to match the new master YAML search-space
Disable radix cache (disable-radix-cache: true) across all recipe files

srt-slurm pin (runners/launch_gb300-cw.sh)

Switch from the fzyzcjy/srt-slurm fork (pinned at 4249d168, which added parallel random prompt generation but lacked the lm-eval orchestrator) to upstream NVIDIA/srt-slurm main branch
This unblocks lm-eval scoring for the GB300 SGLang disagg configs

Eval scoring (generate_sweep_configs.py)

Remove the gb300-cw/dynamo-sglang eval skip guard (added in PR Day 0 DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1157) now that the srt-slurm pin includes the lm-eval orchestrator path

Perf changelog (perf-changelog.yaml)

Add entry documenting the search-space overhaul, recipe rename, and eval re-enablement

Topology summary

Config	Prefill	Decode	Nodes	Concurrency
1p1d-tp4-tp4	1×TP4	1×TP4	2	1
1p1d-dep4-dep16	1×DP-EP4	1×DP-EP16	5	1,024
4p1d-dep4-dep16	4×DP-EP4	1×DP-EP16	8	1,024
8p1d-dep4-dep16	8×DP-EP4	1×DP-EP16	12	4,096
10p1d-dep4-dep16	10×DP-EP4	1×DP-EP16	14	8,192
12p1d-dep4-dep12	12×DP-EP4	1×DP-EP12	15	21,504

github-actions · 2026-05-07T00:42:17Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

functionstackx · 2026-05-07T00:53:31Z

@Oseltamivir or @cquil11 can u help review & validate this?

ch-wan · 2026-05-07T00:55:51Z

/sweep

github-actions · 2026-05-07T00:56:07Z

@ch-wan Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25469744396
Command: ``
Pinned ref: 3a192ab
Approval: not required (trusted collaborator).

github-actions · 2026-05-07T01:01:20Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25469891963
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25469891963

github-actions · 2026-05-07T01:08:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25469899213
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25469899213

github-actions · 2026-05-07T02:15:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25470089427
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25470089427

github-actions · 2026-05-07T02:17:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472300450
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472300450

github-actions · 2026-05-07T02:19:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472347173
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472347173

…isagg-gb300-0506

github-actions · 2026-05-07T02:23:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472386488
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472386488

github-actions · 2026-05-07T02:57:34Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25472555374
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25472555374

…isagg-gb300-0506

github-actions · 2026-05-07T03:29:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25473604582
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25473604582

github-actions · 2026-05-07T05:29:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25473604582
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25473604582

github-actions · 2026-05-07T07:22:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25473604582
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25473604582

github-actions · 2026-05-07T07:52:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25483080867
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25483080867

github-actions · 2026-05-07T12:30:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25483080867
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25483080867

github-actions · 2026-05-07T17:16:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25507127672
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25507127672

cquil11 · 2026-05-07T17:19:36Z

@claude confirm that all the top level parameters changed in the master yamls matches that which is set in the recipe files (ie TP, EP, num workers, etc)

cquil11 · 2026-05-07T17:19:52Z

@claude add PR description for posterity

Klaud-Cold · 2026-05-07T17:20:02Z

Claude finished @cquil11's task in 1m 24s —— View job

Verifying Master YAML ↔ Recipe File Parameter Consistency

Read master YAML changes (nvidia-master.yaml)
Read each recipe file and extract top-level parameters
Cross-check TP, EP, num workers, and other key parameters
Report findings

Results

I compared all 6 dsv4-fp4-gb300-dynamo-sglang search-space entries in nvidia-master.yaml against their corresponding recipe YAML files. Parameters checked: num-worker, TP, EP, dp-attn, concurrency, image, and node counts.

1. `disagg-gb300-1p1d-dep4-dep16-5-c1024` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	1	`prefill_workers: 1`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	4	`expert-parallel-size: 4`
Prefill dp-attn	true	`enable-dp-attention: true`
Decode num-worker	1	`decode_workers: 1`
Decode TP	16	`tensor-parallel-size: 16`
Decode EP	16	`expert-parallel-size: 16`
Decode dp-attn	true	`enable-dp-attention: true`
Concurrency	[1024]	`concurrencies: "1024"`
Total nodes	5 (1P+4D)	`prefill_nodes: 1, decode_nodes: 4` ✅

2. `disagg-gb300-4p1d-dep4-dep16-8-c1024` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	4	`prefill_workers: 4`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	4	`expert-parallel-size: 4`
Prefill dp-attn	true	`enable-dp-attention: true`
Decode num-worker	1	`decode_workers: 1`
Decode TP	16	`tensor-parallel-size: 16`
Decode EP	16	`expert-parallel-size: 16`
Decode dp-attn	true	`enable-dp-attention: true`
Concurrency	[1024]	`concurrencies: "1024"`
Total nodes	8 (4P+4D)	`prefill_nodes: 4, decode_nodes: 4` ✅

3. `disagg-gb300-8p1d-dep4-dep16-12-c4096` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	8	`prefill_workers: 8`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	4	`expert-parallel-size: 4`
Prefill dp-attn	true	`enable-dp-attention: true`
Decode num-worker	1	`decode_workers: 1`
Decode TP	16	`tensor-parallel-size: 16`
Decode EP	16	`expert-parallel-size: 16`
Decode dp-attn	true	`enable-dp-attention: true`
Concurrency	[4096]	`concurrencies: "4096"`
Total nodes	12 (8P+4D)	`prefill_nodes: 8, decode_nodes: 4` ✅

4. `disagg-gb300-1p1d-tp4-tp4-2-c1` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	1	`prefill_workers: 1`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	1	`expert-parallel-size: 1`
Prefill dp-attn	false	no `enable-dp-attention` (absent = false)
Decode num-worker	1	`decode_workers: 1`
Decode TP	4	`tensor-parallel-size: 4`
Decode EP	1	`expert-parallel-size: 1`
Decode dp-attn	false	no `enable-dp-attention` (absent = false)
Concurrency	[1]	`concurrencies: "1"`
Total nodes	2 (1P+1D)	`prefill_nodes: 1, decode_nodes: 1` ✅

5. `disagg-gb300-10p1d-dep4-dep16-14-c8192` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	10	`prefill_workers: 10`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	4	`expert-parallel-size: 4`
Prefill dp-attn	true	`enable-dp-attention: true`
Decode num-worker	1	`decode_workers: 1`
Decode TP	16	`tensor-parallel-size: 16`
Decode EP	16	`expert-parallel-size: 16`
Decode dp-attn	true	`enable-dp-attention: true`
Concurrency	[8192]	`concurrencies: "8192"`
Total nodes	14 (10P+4D)	`prefill_nodes: 10, decode_nodes: 4` ✅

6. `disagg-gb300-12p1d-dep4-dep12-15-c21504` — ✅ All match

Parameter	Master YAML	Recipe File
Prefill num-worker	12	`prefill_workers: 12`
Prefill TP	4	`tensor-parallel-size: 4`
Prefill EP	4	`expert-parallel-size: 4`
Prefill dp-attn	true	`enable-dp-attention: true`
Decode num-worker	1	`decode_workers: 1`
Decode TP	12	`tensor-parallel-size: 12`
Decode EP	12	`expert-parallel-size: 12`
Decode dp-attn	true	`enable-dp-attention: true`
Concurrency	[21504]	`concurrencies: "21504"`
Total nodes	15 (12P+3D)	`prefill_nodes: 12, decode_nodes: 3` ✅

Summary

All 6 entries are fully consistent. Every top-level parameter in nvidia-master.yaml (num-worker, TP, EP, dp-attn, concurrency) matches the corresponding values in each recipe file's resources and sglang_config sections. Node counts encoded in the recipe filenames also check out (prefill_nodes + decode_nodes = total). The container image (lmsysorg/sglang-staging:deepseek-v4-grace-blackwell-dev) is consistent across the master YAML and all recipe files.

Klaud-Cold · 2026-05-07T17:20:23Z

Claude finished @cquil11's task in 1m 1s —— View job

Adding PR Description

Gather context on all changed files
Draft and update PR description

PR description has been added with a full summary of all changes: search-space overhaul (WideEP TP=16 decode, scaled concurrencies), recipe file renames, srt-slurm pin switch to upstream NVIDIA/srt-slurm main, eval re-enablement, and a topology summary table.

github-actions · 2026-05-07T17:42:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25507127672
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25507127672

github-actions · 2026-05-07T18:46:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25507127672
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25507127672

Re-derived each MTP recipe by cloning the closest-topology working base recipe from main (verified to pass CI in PR #1295) and only adding the four `speculative-*` keys for EAGLE/MTP plus topology adjustments (prefill_workers, decode_nodes, gpus_per_decode, decode TP/DP/EP, concurrencies). All env vars, sbatch_directives, container, dynamo hash, frontend args, deepep config, and other tuning come from the working base verbatim — discarding the elvischenv structure that used different env vars (PRECOMPILE vs FAST_WARMUP) and `mxfp4` precision. Recipe -> base mapping: - disagg-low-latency-1p1d-tp4-tp4 -> disagg-gb300-1p1d-tp4-tp4-2-c1 - disagg-mid-curve-1p1d-dep4-dep16 -> disagg-gb300-1p1d-dep4-dep16-5-c1024 (+conc=256) - disagg-mid-curve-1p1d-dep4-dep8 -> wideep base, decode TP=8, conc=256 - disagg-mid-curve-2p1d-dep4-dep8 -> wideep base, 2P, decode TP=8, conc=512 - disagg-mid-curve-4p1d-dep4-dep8 -> wideep base, 4P, decode TP=8, conc=1024 - disagg-low-latency-1p6d-dep4-tp4 -> hybrid: wideep prefill + 1p1d-tp4-tp4 decode Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ch-wan added 2 commits May 6, 2026 17:21

update config

69c9236

update

3a192ab

ch-wan requested a review from a team May 7, 2026 00:42

ch-wan requested review from jgangani and kedarpotdar-nv as code owners May 7, 2026 00:42

github-project-automation Bot added this to InferenceMAX Board May 7, 2026

ch-wan added sweep-enabled full-sweep-enabled and removed sweep-enabled full-sweep-enabled labels May 7, 2026

ch-wan force-pushed the sglang-disagg-gb300-0506 branch from 05052ba to 3a192ab Compare May 7, 2026 00:55

Merge branch 'main' into sglang-disagg-gb300-0506

3d0842e

claude Bot reviewed May 7, 2026

View reviewed changes

Comment thread utils/matrix_logic/generate_sweep_configs.py

ch-wan added full-sweep-enabled and removed sweep-enabled labels May 7, 2026

ch-wan force-pushed the sglang-disagg-gb300-0506 branch from 5609be7 to 211c0bc Compare May 7, 2026 02:17

update perf-changelog

37967c5

ch-wan force-pushed the sglang-disagg-gb300-0506 branch from 211c0bc to 37967c5 Compare May 7, 2026 02:18

Merge commit 'ec02dba576849da17fbda4433ef25e1ce3b0d03b' into sglang-d…

6c7908d

…isagg-gb300-0506

fix

c4054eb

ch-wan force-pushed the sglang-disagg-gb300-0506 branch from 47ebbd8 to c4054eb Compare May 7, 2026 03:00

Merge commit 'f205e52de70148be44419e3b9166ed411d5f6124' into sglang-d…

5350f09

…isagg-gb300-0506

disable radix cache

af3ed1e

fix ci

d9570f8

cquil11 merged commit 876b595 into main May 7, 2026
58 of 64 checks passed

cquil11 deleted the sglang-disagg-gb300-0506 branch May 7, 2026 17:48

github-project-automation Bot moved this to Done in InferenceMAX Board May 7, 2026

claude Bot mentioned this pull request May 7, 2026

(radixark sgl maintainer submission): Add DSV4 FP4 GB300 dynamo-sglang MTP disagg benchmarks #1297

Merged

3 tasks

Conversation

ch-wan commented May 7, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Topology summary

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

functionstackx commented May 7, 2026

Uh oh!

ch-wan commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

cquil11 commented May 7, 2026

Uh oh!

cquil11 commented May 7, 2026

Uh oh!

Klaud-Cold commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying Master YAML ↔ Recipe File Parameter Consistency

Results

1. disagg-gb300-1p1d-dep4-dep16-5-c1024 — ✅ All match

2. disagg-gb300-4p1d-dep4-dep16-8-c1024 — ✅ All match

3. disagg-gb300-8p1d-dep4-dep16-12-c4096 — ✅ All match

4. disagg-gb300-1p1d-tp4-tp4-2-c1 — ✅ All match

5. disagg-gb300-10p1d-dep4-dep16-14-c8192 — ✅ All match

6. disagg-gb300-12p1d-dep4-dep12-15-c21504 — ✅ All match

Summary

Uh oh!

Klaud-Cold commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ch-wan commented May 7, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented May 7, 2026 •

edited

Loading

1. `disagg-gb300-1p1d-dep4-dep16-5-c1024` — ✅ All match

2. `disagg-gb300-4p1d-dep4-dep16-8-c1024` — ✅ All match

3. `disagg-gb300-8p1d-dep4-dep16-12-c4096` — ✅ All match

4. `disagg-gb300-1p1d-tp4-tp4-2-c1` — ✅ All match

5. `disagg-gb300-10p1d-dep4-dep16-14-c8192` — ✅ All match

6. `disagg-gb300-12p1d-dep4-dep12-15-c21504` — ✅ All match

Klaud-Cold commented May 7, 2026 •

edited

Loading