Re-PR #1285: pure TP H200 vLLM DSv4 config + missing perf-changelog entry#1287
Re-PR #1285: pure TP H200 vLLM DSv4 config + missing perf-changelog entry#1287
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - "Re-PR of #1285 (which originally landed without a perf-changelog entry — reverted in #1286)" | ||
| - "Parameterize TP vs DP-attention + expert-parallel via DP_ATTENTION/EP_SIZE env vars in dsv4_fp8_h200{,_mtp}.sh (matching dsv4_fp4_b200_vllm.sh)" | ||
| - "Add pure TP search-space row (tp:8, ep:1, dp-attn:false) alongside the existing DP+EP row in nvidia-master.yaml" | ||
| - "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row" |
There was a problem hiding this comment.
🟡 The bullet at line 2242 says "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row", but the underlying #1285 change also lowered conc-start from 4 to 1 on that same row (verifiable via git show 366ffbc -- .github/configs/nvidia-master.yaml). This applies to both isl/osl variants of dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp. Suggest rewording to e.g. "Widen the existing DP+EP row's concurrency sweep from 4–64 to 1–256" so the changelog faithfully describes the sweep change.
Extended reasoning...
What the changelog says vs. what the code change does
Line 2242 reads:
"Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row"
The actual change re-applied by this PR (originally #1285, commit 366ffbc) modifies BOTH endpoints of the concurrency sweep on the existing DP+EP row, not just conc-end.
Step-by-step proof from the upstream commit
git show 366ffbc -- .github/configs/nvidia-master.yaml shows, for both dsv4-fp8-h200-vllm (around line 2626) and dsv4-fp8-h200-vllm-mtp (around line 2648), and for both the 1k1k (isl: 1024, osl: 1024) and 8k1k (isl: 8192, osl: 1024) sweeps:
- - { tp: 8, ep: 8, dp-attn: true, conc-start: 4, conc-end: 64 }
+ - { tp: 8, ep: 1, dp-attn: false, conc-start: 1, conc-end: 256 }
+ - { tp: 8, ep: 8, dp-attn: true, conc-start: 1, conc-end: 256 }So the existing DP+EP row went from conc-start: 4, conc-end: 64 to conc-start: 1, conc-end: 256. Both endpoints moved.
Why this matters
The purpose of perf-changelog.yaml is to be an authoritative description of every change to published sweep configurations. The omitted conc-start: 4 → 1 materially expands the sweep coverage at the low-concurrency end — concurrencies 1, 2, and 3 are now included on the DP+EP row when they previously were not. A future reader trying to reconcile sweep results against the changelog would see new low-concurrency points on this row and have no record of why.
How to fix
Replace the line with something that captures both endpoints, e.g.:
- "Widen the existing DP+EP row's concurrency sweep from 4–64 to 1–256 for both 1k1k and 8k1k sweeps"or a two-bullet split:
- "Lower conc-start from 4 to 1 and raise conc-end from 64 to 256 on the existing DP+EP row (1k1k and 8k1k sweeps)"Severity
This is a doc-only PR and the omission is a documentation completeness issue rather than a runtime bug, so severity is nit — but worth fixing because the entire purpose of this file is to faithfully record sweep-config changes, and the description as written understates the change.
| - "Parameterize TP vs DP-attention + expert-parallel via DP_ATTENTION/EP_SIZE env vars in dsv4_fp8_h200{,_mtp}.sh (matching dsv4_fp4_b200_vllm.sh)" | ||
| - "Add pure TP search-space row (tp:8, ep:1, dp-attn:false) alongside the existing DP+EP row in nvidia-master.yaml" | ||
| - "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285 |
There was a problem hiding this comment.
🟡 The new perf-changelog entry sets pr-link: .../pull/1285, but the established convention in this file is for pr-link to reference the PR that adds the entry (e.g. line 2233 → #1257, line 2223 → #1270, line 2216 → #1279). This entry is added by #1287, and the PR description itself says 'Add perf-changelog.yaml entry referencing this PR'. Suggest changing pr-link to #1287; the original #1285 context is already preserved in the first description bullet, so no information is lost.
Extended reasoning...
What the bug is
The newly added entry at perf-changelog.yaml:2243 uses pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285. Per the convention used by every other entry in this file, pr-link should point to the PR that introduced the entry — i.e. this PR (#1287) — not the upstream PR whose code is being documented.
Convention evidence (step-by-step proof)
Walking the three immediately-preceding entries in the same file:
- Line 2233:
pr-link: .../pull/1257— this entry was added by [NV] qwen3.5 fp4 b200 sglang mtp #1257 itself (commit 643c3d7 'qwen3.5 fp4 b200 sglang mtp ([NV] qwen3.5 fp4 b200 sglang mtp #1257)'). - Line 2223:
pr-link: .../pull/1270— added by Update DSV4 TRT fused MHC image #1270. - Line 2216:
pr-link: .../pull/1279— added by dsv4-fp8-h200-vllm-mtp: bump num_speculative_tokens 1 → 2 #1279.
In every case, pr-link points to the PR that landed the changelog row, not to a separate upstream change. The new entry breaks this pattern: it is being added by #1287 but points at #1285.
Why the PR description supports the fix
The description's test plan explicitly says 'Add perf-changelog.yaml entry referencing this PR'. 'This PR' is #1287. The current value (#1285) directly contradicts the stated intent.
Addressing the refutation
One verifier argued the link to #1285 is defensible because the substantive code change lives in #1285's commits and #1287 only re-applies them. While true, that context is already preserved — the entry's first description bullet reads 'Re-PR of #1285 (which originally landed without a perf-changelog entry — reverted in #1286)'. Pointing pr-link at #1287 therefore loses no discoverability of the original code-change discussion (it remains one click away via the embedded #1285 reference), while gaining discoverability from the changelog entry to its own merge PR — which is what every other entry in the file provides. The 're-PR scenario is unusual' argument is precisely why the convention should be followed: a reader scanning the changelog and clicking pr-link expects to land on the PR where this row was discussed and merged, which is #1287.
Impact
Minor — the link still resolves to a meaningful PR with the original code change. This is a discoverability/consistency issue, not a correctness bug. Severity: nit.
Fix
Change line 2243 from pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285 to pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1287.
c045265 to
9fc57c8
Compare
9fc57c8 to
ef01175
Compare
Disclaimer
The original PR #1285 ("Add pure TP configuration to H200 vLLM DSv4 deployment") landed
without a corresponding
perf-changelog.yamlentry, which our policy requires for anychange that alters published benchmark configurations or sweep ranges.
It was reverted in #1286 and is being re-landed in this PR together with the missing
changelog entry. Code changes are otherwise identical to #1285.
Dependencies
This PR depends on #1286 (the revert) merging first. Until that lands, the
GitHub diff against `main` will only show the new `perf-changelog.yaml` entry,
because the underlying code changes are still on `main` from #1285. Once #1286
merges, this PR's diff will reflect the full re-application plus the changelog entry.
Summary
Test plan
🤖 Generated with Claude Code