-
Notifications
You must be signed in to change notification settings - Fork 204
Update dsv4 b300 configs #1155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dsv4 b300 configs #1155
Changes from 2 commits
97b4aae
9043b4a
bc2be9c
bc87ac8
c5d88fc
7330508
457398a
0ed17a1
716ebac
427a963
54f5664
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1819,3 +1819,12 @@ | |
| - "Restore the recipe-per-CONC split (low-latency / balanced / max-throughput) on top of the low-latency-only fallback from #1143; the DeepEP FP8 weight-postprocess path is fixed, so the high-throughput scenario runs again" | ||
| - "Recipes from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1132 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b300-vllm | ||
| description: | ||
| - "Update search space based on B300 pareto sweep results" | ||
| - "ISL=1024: TP4 conc 4-128; DP4 (dp-attn) conc 256-4096; DP8 (dp-attn) conc 2048-8192" | ||
| - "ISL=8192: TP4 conc 4-64; DP4 (dp-attn) conc 128-1024; DP8 (dp-attn) conc 1024-8192" | ||
|
Check warning on line 1828 in perf-changelog.yaml
|
||
|
Comment on lines
+1865
to
+1903
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The new perf-changelog entry at perf-changelog.yaml:1822-1830 documents only the search-space reshape, but the diff also rewrites Extended reasoning...What the bug isThis PR makes two substantive changes to
Why this matters for the audit trailThe predecessor entry (PR #1144, the introducing PR for this config) explicitly records "max-num-batched-tokens 2048" as part of the launch-arg summary. After this PR that statement is stale, but no follow-up entry replaces it. AGENTS.md describes Step-by-step proof of impactFor an ISL=8192 run in TP mode under the new formula:
For ISL=8192 in DP mode (
These are not cosmetic deltas: they reshape the prefill-batching capacity and will alter throughput / TTFT characteristics relative to anything benchmarked under PR #1144. Addressing the refutationsOne refuter argued this is subjective wording and that perf-changelog descriptions vary in detail (some are one-liners). That is true in general, but here the new entry is specifically a successor to PR #1144's entry for the same Severity and fixThis is a documentation-completeness issue with no runtime impact, so it is filed at nit severity (matching all three independent confirmations). Fix: append one description line to the new entry, e.g.
|
||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1155 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The two new TP=8/dp-attn:true rows added to
dsv4-fp4-b300-vllm(lines 2482 and 2488) omit theepfield, sogenerate_sweep_configs.pydefaults the metadata toep=1. Butdsv4_fp4_b300_vllm.shunconditionally passes--enable-expert-paralleland sets--data-parallel-size 8for these rows, so the actual run is EP=8 — the result-filename template (tp${TP}-ep${EP_SIZE}-dpa${DP_ATTENTION}) and downstream group-by tooling will tag these B300 rows asep=1while the underlying run is EP=8. Sister configdsv4-fp8-h200-vllmat line 2458/2462 explicitly tags the analogous TP=8/dp-attn:true row as{ tp: 8, ep: 8, dp-attn: true, ... }. Suggest adding explicitep: 8to both new TP=8 entries to match the convention. (Note: the existing TP=4 dp-attn:true rows on this same config also omitep, but that pattern was inherited from PR #1144 — this PR extends the issue to TP=8.)Extended reasoning...
What the bug is
The two new search-space rows added to
dsv4-fp4-b300-vllmomit theepfield:In
utils/matrix_logic/generate_sweep_configs.py:354, the matrix-entry default forepis1(set unconditionally), and lines 362-363 only overrideepif the YAML key was present (bmk.get(Fields.EP.value)returnsNonewhen omitted). So these matrix entries are tagged withep=1in the generated metadata.Why the actual runtime is EP=8
benchmarks/single_node/dsv4_fp4_b300_vllm.shdoes not consult the metadataepvalue at all. At line 38 the parallel block sets--data-parallel-size "$TP"(i.e.--data-parallel-size 8whenTP=8andDP_ATTENTION=true), and at line 78 it unconditionally passes--enable-expert-parallel. Under vLLM,--enable-expert-parallelwith--data-parallel-size 8runs with effective expert-parallel world size 8 (each rank holds 1/8 of the experts). So the runtime is EP=8 while the metadata says EP=1.Why this is a metadata mismatch worth flagging
The sister config
dsv4-fp8-h200-vllmat lines 2458 and 2462 explicitly tags the analogous TP=8/dp-attn:true rows as{ tp: 8, ep: 8, dp-attn: true, ... }— confirming thatep: 8is the established convention for this scenario across the dsv4 family. The metadata flows intoRESULT_FILENAME(via.github/workflows/benchmark-tmpl.yml:146, templatetp${TP}-ep${EP_SIZE}-dpa${DP_ATTENTION}), so the new B300 rows will be saved astp8-ep1-dpaTruewhile the actual run is EP=8. Downstream group-by tooling (compare_results.py,summarize.py,collect_eval_results.py) keys onep, so cross-config analysis across the dsv4 family will misclassify these B300 rows.Step-by-step proof
{ tp: 8, dp-attn: true, conc-start: 2048, conc-end: 8192 }(line 2482).generate_sweep_configs.py:354assignsep=1(default). Line 362 checksbmk.get('ep')which isNone, so the default is kept. Metadata:tp=8, ep=1, dp-attn=true.EP_SIZE=1,TP=8,DP_ATTENTION=true. Result-filename template atbenchmark-tmpl.yml:146resolves totp8-ep1-dpaTrue-....dsv4_fp4_b300_vllm.shruns with--tensor-parallel-size 1 --data-parallel-size 8 --enable-expert-parallel. vLLM's effective EP world size is 8.ep1while the run isEP=8. The h200 sister config tags the same scenarioep=8.Impact and fix
Metadata-only — runtime behavior is correct because the script hardcodes parallelism. Severity is
nit. Fix: addep: 8to both new TP=8 entries (lines 2482 and 2488) to matchdsv4-fp8-h200-vllm. Ideallyep: 4would also be added to the kept TP=4/dp-attn:true rows for full consistency, but that pattern was inherited from PR #1144 and is outside the scope of this PR.