Re-PR #1285: pure TP H200 vLLM DSv4 config + missing perf-changelog entry by cquil11 · Pull Request #1287 · SemiAnalysisAI/InferenceX

cquil11 · 2026-05-05T23:33:31Z

Disclaimer

The original PR #1285 ("Add pure TP configuration to H200 vLLM DSv4 deployment") landed
without a corresponding perf-changelog.yaml entry, which our policy requires for any
change that alters published benchmark configurations or sweep ranges.

It was reverted in #1286 and is being re-landed in this PR together with the missing
changelog entry. Code changes are otherwise identical to #1285.

Dependencies

This PR depends on #1286 (the revert) merging first. Until that lands, the
GitHub diff against `main` will only show the new `perf-changelog.yaml` entry,
because the underlying code changes are still on `main` from #1285. Once #1286
merges, this PR's diff will reflect the full re-application plus the changelog entry.

Summary

Parameterize TP vs DP-attention + expert-parallel via `DP_ATTENTION` / `EP_SIZE` env vars in `dsv4_fp8_h200{,_mtp}.sh` (matching `dsv4_fp4_b200_vllm.sh`)
Add pure TP search-space row (`tp:8, ep:1, dp-attn:false`) alongside the existing DP+EP row in `nvidia-master.yaml`
Raise `conc-end` from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row
Add `perf-changelog.yaml` entry referencing this PR

Test plan

Revert #1285: original PR missing perf-changelog entry #1286 merges first
CI green on this PR
Sweep results land on the configured runners

🤖 Generated with Claude Code

github-actions · 2026-05-05T23:33:45Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-05T23:33:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-05T23:39:51Z

+    - "Re-PR of #1285 (which originally landed without a perf-changelog entry — reverted in #1286)"
+    - "Parameterize TP vs DP-attention + expert-parallel via DP_ATTENTION/EP_SIZE env vars in dsv4_fp8_h200{,_mtp}.sh (matching dsv4_fp4_b200_vllm.sh)"
+    - "Add pure TP search-space row (tp:8, ep:1, dp-attn:false) alongside the existing DP+EP row in nvidia-master.yaml"
+    - "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row"


🟡 The bullet at line 2242 says "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row", but the underlying #1285 change also lowered conc-start from 4 to 1 on that same row (verifiable via git show 366ffbc -- .github/configs/nvidia-master.yaml). This applies to both isl/osl variants of dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp. Suggest rewording to e.g. "Widen the existing DP+EP row's concurrency sweep from 4–64 to 1–256" so the changelog faithfully describes the sweep change.

Extended reasoning...

What the changelog says vs. what the code change does

Line 2242 reads:

"Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row"

The actual change re-applied by this PR (originally #1285, commit 366ffbc) modifies BOTH endpoints of the concurrency sweep on the existing DP+EP row, not just conc-end.

Step-by-step proof from the upstream commit

git show 366ffbc -- .github/configs/nvidia-master.yaml shows, for both dsv4-fp8-h200-vllm (around line 2626) and dsv4-fp8-h200-vllm-mtp (around line 2648), and for both the 1k1k (isl: 1024, osl: 1024) and 8k1k (isl: 8192, osl: 1024) sweeps:

- - { tp: 8, ep: 8, dp-attn: true, conc-start: 4, conc-end: 64 } + - { tp: 8, ep: 1, dp-attn: false, conc-start: 1, conc-end: 256 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 1, conc-end: 256 }

So the existing DP+EP row went from conc-start: 4, conc-end: 64 to conc-start: 1, conc-end: 256. Both endpoints moved.

Why this matters

The purpose of perf-changelog.yaml is to be an authoritative description of every change to published sweep configurations. The omitted conc-start: 4 → 1 materially expands the sweep coverage at the low-concurrency end — concurrencies 1, 2, and 3 are now included on the DP+EP row when they previously were not. A future reader trying to reconcile sweep results against the changelog would see new low-concurrency points on this row and have no record of why.

How to fix

Replace the line with something that captures both endpoints, e.g.:

- "Widen the existing DP+EP row's concurrency sweep from 4–64 to 1–256 for both 1k1k and 8k1k sweeps"

or a two-bullet split:

- "Lower conc-start from 4 to 1 and raise conc-end from 64 to 256 on the existing DP+EP row (1k1k and 8k1k sweeps)"

Severity

This is a doc-only PR and the omission is a documentation completeness issue rather than a runtime bug, so severity is nit — but worth fixing because the entire purpose of this file is to faithfully record sweep-config changes, and the description as written understates the change.

claude · 2026-05-05T23:39:51Z

+    - "Parameterize TP vs DP-attention + expert-parallel via DP_ATTENTION/EP_SIZE env vars in dsv4_fp8_h200{,_mtp}.sh (matching dsv4_fp4_b200_vllm.sh)"
+    - "Add pure TP search-space row (tp:8, ep:1, dp-attn:false) alongside the existing DP+EP row in nvidia-master.yaml"
+    - "Raise conc-end from 64 to 256 for both 1k1k and 8k1k sweeps on the existing DP+EP row"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285


🟡 The new perf-changelog entry sets pr-link: .../pull/1285, but the established convention in this file is for pr-link to reference the PR that adds the entry (e.g. line 2233 → #1257, line 2223 → #1270, line 2216 → #1279). This entry is added by #1287, and the PR description itself says 'Add perf-changelog.yaml entry referencing this PR'. Suggest changing pr-link to #1287; the original #1285 context is already preserved in the first description bullet, so no information is lost.

Extended reasoning...

What the bug is

The newly added entry at perf-changelog.yaml:2243 uses pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285. Per the convention used by every other entry in this file, pr-link should point to the PR that introduced the entry — i.e. this PR (#1287) — not the upstream PR whose code is being documented.

Convention evidence (step-by-step proof)

Walking the three immediately-preceding entries in the same file:

Line 2233: pr-link: .../pull/1257 — this entry was added by [NV] qwen3.5 fp4 b200 sglang mtp #1257 itself (commit 643c3d7 'qwen3.5 fp4 b200 sglang mtp ([NV] qwen3.5 fp4 b200 sglang mtp #1257)').

Line 2223: pr-link: .../pull/1270 — added by Update DSV4 TRT fused MHC image #1270.

Line 2216: pr-link: .../pull/1279 — added by dsv4-fp8-h200-vllm-mtp: bump num_speculative_tokens 1 → 2 #1279.

In every case, pr-link points to the PR that landed the changelog row, not to a separate upstream change. The new entry breaks this pattern: it is being added by #1287 but points at #1285.

Why the PR description supports the fix

The description's test plan explicitly says 'Add perf-changelog.yaml entry referencing this PR'. 'This PR' is #1287. The current value (#1285) directly contradicts the stated intent.

Addressing the refutation

One verifier argued the link to #1285 is defensible because the substantive code change lives in #1285's commits and #1287 only re-applies them. While true, that context is already preserved — the entry's first description bullet reads 'Re-PR of #1285 (which originally landed without a perf-changelog entry — reverted in #1286)'. Pointing pr-link at #1287 therefore loses no discoverability of the original code-change discussion (it remains one click away via the embedded #1285 reference), while gaining discoverability from the changelog entry to its own merge PR — which is what every other entry in the file provides. The 're-PR scenario is unusual' argument is precisely why the convention should be followed: a reader scanning the changelog and clicking pr-link expects to land on the PR where this row was discussed and merged, which is #1287.

Impact

Minor — the link still resolves to a meaningful PR with the original code change. This is a discoverability/consistency issue, not a correctness bug. Severity: nit.

Fix

Change line 2243 from pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1285 to pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1287.

#1285)"" This reverts commit 2d8547c.

The original PR #1285 landed without this changelog entry; it was reverted in #1286 and is being re-landed here together with the missing entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 requested a review from a team May 5, 2026 23:33

github-project-automation Bot added this to InferenceMAX Board May 5, 2026

claude Bot reviewed May 5, 2026

View reviewed changes

Revert "Revert "Add pure TP configuration to H200 vLLM DSv4 deployment (

1e06afa

#1285)"" This reverts commit 2d8547c.

cquil11 force-pushed the redo-1285-with-perf-changelog branch from c045265 to 9fc57c8 Compare May 6, 2026 00:03

cquil11 requested review from jgangani and kedarpotdar-nv as code owners May 6, 2026 00:03

Add perf-changelog entry for dsv4-fp8-h200-vllm{,-mtp} pure-TP config

ef01175

The original PR #1285 landed without this changelog entry; it was reverted in #1286 and is being re-landed here together with the missing entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 force-pushed the redo-1285-with-perf-changelog branch from 9fc57c8 to ef01175 Compare May 6, 2026 00:05

cquil11 merged commit a9c29b1 into main May 6, 2026
19 checks passed

github-project-automation Bot moved this to Done in InferenceMAX Board May 6, 2026

cquil11 deleted the redo-1285-with-perf-changelog branch May 6, 2026 00:06

claude Bot mentioned this pull request May 7, 2026

[DEBUG] remove qwen3.5-fp8-mi355x-atom perf-changelog entry #1296

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-PR #1285: pure TP H200 vLLM DSv4 config + missing perf-changelog entry#1287

Re-PR #1285: pure TP H200 vLLM DSv4 config + missing perf-changelog entry#1287
cquil11 merged 2 commits intomainfrom
redo-1285-with-perf-changelog

cquil11 commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

claude Bot May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented May 5, 2026

Disclaimer

Dependencies

Summary

Test plan

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

What the changelog says vs. what the code change does

Step-by-step proof from the upstream commit

Why this matters

How to fix

Severity

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant