DSv4 B300 TRT by Oseltamivir · Pull Request #1233 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-04-30T00:34:55Z

No description provided.

# Conflicts: # .github/configs/nvidia-master.yaml # perf-changelog.yaml

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 perf-changelog.yaml:2017 — The new perf-changelog.yaml entry at line 2017 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — a literal placeholder rather than the actual PR number. Every other entry in this file uses a real numeric PR id (e.g. the immediately preceding entry uses pull/1221); this should be updated to pull/1233 before merge so the changelog correctly traces back to its PR.

Extended reasoning...

What's broken. The newly added perf-changelog.yaml entry for dsv4-fp4-b200-trt and dsv4-fp4-b300-trt ends with:\n\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX\n\n\nXXX is a literal placeholder string left over from authoring — not the PR number. The current PR (per the metadata in this review) is #1233, so this should read https://github.com/SemiAnalysisAI/InferenceX/pull/1233.\n\nWhy this is wrong. Every other entry in perf-changelog.yaml uses an actual numeric PR id. The three entries immediately preceding this one end with pull/1209, pull/1218, and pull/1221 respectively, and a quick scan of the rest of the file confirms the pattern holds throughout. Leaving XXX in place breaks the changelog's only mechanism for tracing a config-change entry back to the PR that introduced it.\n\nWhy existing tooling didn't catch it. perf-changelog.yaml appears to be free-form YAML metadata; there is no schema validator that requires pr-link to be a real GitHub URL. YAML parsers happily accept the placeholder as a string, and PR review is the last line of defense.\n\nImpact. Cosmetic / documentation only — no runtime behavior is affected. But once merged, anyone auditing the changelog (e.g., "which PR added the dsv4-fp4-b200-trt config?") will hit a 404 instead of landing on the correct PR. Fixing it post-merge requires another PR to touch the changelog, which is wasteful.\n\nHow to fix. Replace XXX with 1233 on line 2017 of perf-changelog.yaml before merging this PR.\n\nStep-by-step proof.\n1. Open perf-changelog.yaml at the bottom of the file (the new entry added by this PR).\n2. Observe the entry's last line: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.\n3. Compare with the immediately preceding entry (the dsv4-fp4-b300-vllm "Change image to vllm/vllm-openai:v0.20.0-cu130" entry): its pr-link is https://github.com/SemiAnalysisAI/InferenceX/pull/1221 — a real numeric id.\n4. Compare with the PR's own metadata: the current PR number is 1233.\n5. Conclusion: the new entry should use pull/1233, matching the convention used by every other entry in the file.

github-actions · 2026-05-01T22:12:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25231354124
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25231354124

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 perf-changelog.yaml:2099-2101 — The new dsv4-fp4-b300-trt changelog entry says the diagnostic compares 'NVFP4 KV cache', but benchmarks/single_node/dsv4_fp4_b300_trt_diag.sh never invokes run_variant with an nvfp4/fp4 KV dtype — the actual KV dtypes exercised are fp8, unset (auto), bfloat16, and torch.bfloat16. Either replace 'NVFP4 KV cache' with 'BF16 KV cache' (or 'BF16 / torch.bfloat16 KV cache'), or add a real run_variant 'nvfp4_kv_graph' 'nvfp4' ... call so the description matches the JSONL/summary output. Nit only — the diagnostic intentionally exits 1 (TRTLLM_DSV4_DIAG_FAIL_AFTER=1) so this can'''t be confused with a perf result, but reviewers cross-checking the changelog against dsv4_trt_b300_diag_summary.json will see the contradiction.

Extended reasoning...

What the bug is. perf-changelog.yaml:2101 (the new entry for dsv4-fp4-b300-trt) describes the diagnostic as:

"Diagnostic installs fast-hadamard-transform, then compares explicit FP8 KV cache, default KV cache, NVFP4 KV cache, and FP8 KV cache with CUDA graph disabled"

The diagnostic script benchmarks/single_node/dsv4_fp4_b300_trt_diag.sh does not exercise an NVFP4/FP4 KV cache anywhere.

Step-by-step proof. Enumerating every run_variant invocation at the bottom of the diagnostic script and the second positional argument (the kv_dtype):

Variant	`kv_dtype` arg
`baseline_fp8_graph`	`fp8`
`auto_kv_graph`	`unset` (auto/default)
`auto_kv_no_autotune`	`unset`
`auto_kv_vanilla_moe`	`unset`
`fp8_no_cuda_graph`	`fp8`
`fp8_graph_mhc_fused_hc_off`	`fp8`
`baseline_fp8_graph_pp1`	`fp8`
`auto_kv_graph_pp1`	`unset`
`tp4_ep1_dpa_false_fp8_graph`	`fp8`
`tp8_ep8_dpa_true_fp8_graph`	`fp8`
`tp4_ep4_dpa_true_fp8_graph`	`fp8`
`bfloat16_kv_graph`	`bfloat16`
`torch_bfloat16_kv_graph`	`torch.bfloat16`

The set of KV dtypes that ever reach write_config is {fp8, unset, bfloat16, torch.bfloat16}. There is no nvfp4, fp4, or modelopt_fp4 value passed as kv_dtype, and no variant whose name implies an FP4 KV cache. Grepping the file for nvfp4|fp4 only matches the unrelated dsv4-fp4-trt-${variant}.yml config-file template name.

Why the existing knobs don't cover it. write_config only emits dtype: $kv_dtype when $kv_dtype != "unset", and even when set, the value is whatever was passed positionally. None of the call sites pass an FP4-flavored value, so there is no code path inside this diagnostic that produces an NVFP4 KV cache config. The fourth bullet — "FP8 KV cache with CUDA graph disabled" — does map correctly to fp8_no_cuda_graph. Only the third bullet (NVFP4) is fabricated.

Impact. Documentation drift. The diagnostic still produces correct results for the variants it actually has, and the script intentionally exits 1 (TRTLLM_DSV4_DIAG_FAIL_AFTER=1) so the row will never be charted as a benchmark result. But the changelog entry is the human-readable trail of why this PR exists, and it claims an ablation that does not appear in dsv4_trt_b300_diag_summary.json. A reviewer cross-referencing the bullet list against the summary JSON will be unable to find the NVFP4 row and may either chase a phantom variant or assume the summary is incomplete.

How to fix. Two equivalent fixes; pick one:

Match the doc to the script (preferred, smallest diff): in perf-changelog.yaml:2101, replace NVFP4 KV cache with BF16 KV cache (or BF16 / torch.bfloat16 KV cache, since both bfloat16_kv_graph and torch_bfloat16_kv_graph exist). This accurately describes what the diagnostic actually exercises.
Match the script to the doc: add a run_variant 'nvfp4_kv_graph' 'nvfp4' 'on' "$((PORT_BASE + 13))" "TRTLLM" "default" "$TP" "$EP_SIZE" "$DP_ATTENTION" "4" call alongside the existing bfloat16_kv_graph / torch_bfloat16_kv_graph invocations under the TRTLLM_DSV4_DIAG_ENABLE_EXPLICIT_KV_DTYPES block. Note this will only succeed if the pinned feat/deepseek_v4 branch actually accepts nvfp4 as a kv_cache_config.dtype; otherwise the variant will simply fail readiness and the diagnostic will still record that result.

Option 1 is the right call given the diagnostic is temporary and the run is already gated to fail (TRTLLM_DSV4_DIAG_FAIL_AFTER=1). Severity nit.

claude · 2026-05-02T22:41:23Z

+    "dsv4_template_completion_ok": bool(probe.get("dsv4_template_completion_ok", False)),
+    "kvcache_nan_or_inf_warning": kvcache_nan,
+    "hadamard_missing_or_skipped_warning": hadamard_missing,
+    "kv_dtype_or_cache_log_lines": kv_dtype_lines,
+    "moe_warning": bool(moe_warning_lines),


🟡 The mhc word-boundary alternative in the diagnostic regex (benchmarks/single_node/dsv4_fp4_b300_trt_diag.sh:974) is written as r"...|\\bmhc\\b" inside a single-quoted Python heredoc, so the regex engine compiles \\b as a literal backslash + b rather than a word boundary. The bare-mhc probe will therefore never fire and mhc_warning will be silently False for every variant; fix is to use r"\bmhc\b" with a single backslash. This is a low-severity issue in a script that already exits 1 unconditionally via TRTLLM_DSV4_DIAG_FAIL_AFTER.

Extended reasoning...

What the bug is. Line 974 of benchmarks/single_node/dsv4_fp4_b300_trt_diag.sh builds the mhc-fused-HC ablation probe with:

mhc_warning_lines = matching_log_lines(r"mhc_fused_hc|fused_hc|hyper.?connection|\\bmhc\\b", 40)

The enclosing heredoc opens with <<'\''PY'\'' (single-quoted), so bash performs no substitution — Python receives the source verbatim. In a Python raw string, two backslashes stay as two literal backslashes, so the regex engine compiles \\b as escaped-backslash + b, i.e. it matches a literal backslash followed by b, not a word boundary. To get a word boundary in a raw string you need a single backslash: r"\bmhc\b".

Step-by-step proof. Verified directly in Python:

>>> import re >>> buggy = re.compile(r'\''mhc_fused_hc|fused_hc|hyper.?connection|\\bmhc\\b'\'', re.IGNORECASE) >>> bool(buggy.search('\''foo mhc bar'\'')) False >>> bool(buggy.search(r'\''\bmhc\b'\'')) # only the literal text matches True >>> fixed = re.compile(r'\''\bmhc\b'\'') >>> bool(fixed.search('\''foo mhc bar'\'')) True

So the buggy alternative only fires on the literal four-character sequence \bmhc\b, which never appears in TRT-LLM logs.

Why existing code does not save us. The other three alternatives (mhc_fused_hc, fused_hc, hyper.?connection) still match the obvious cases, so the diagnostic does not crash and mhc_warning is sometimes True. But any log line that mentions just bare mhc (e.g. an mhc disabled warning, an mhc layer norm info line) will be silently dropped, defeating the whole point of the bare-mhc probe in the first place.

Impact. Limited. The script intentionally calls exit 1 at the end via TRTLLM_DSV4_DIAG_FAIL_AFTER=1 (line 1257) and is explicitly labelled a temporary diagnostic in perf-changelog.yaml. So a misleading mhc_warning field in dsv4_trt_b300_diag_summary.json does not affect the run outcome — it just makes the mhc-fused-HC ablation summary less useful while the script lives.

How to fix. Replace \\bmhc\\b with \bmhc\b (single backslashes). Inside the single-quoted heredoc the regex source is passed verbatim, so a single backslash is what you actually want.

github-actions · 2026-05-03T00:05:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25263534314
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25263534314

github-actions · 2026-05-03T00:26:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25265372369
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25265372369

github-actions · 2026-05-03T01:37:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25266831448
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25266831448

github-actions · 2026-05-03T01:39:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25266848432
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25266848432

github-actions · 2026-05-03T02:29:16Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25266886145
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25266886145

github-actions · 2026-05-03T03:09:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25268325388
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25268325388

functionstackx · 2026-05-03T03:14:12Z

+if [[ "${RUN_EVAL:-false}" == "true" || "${EVAL_ONLY:-false}" == "true" ]]; then
+    # DeepSeek-V4-Pro has hidden size 7168. The current TRTLLM fused-HC MHC
+    # path corrupts eval generations for this shape; keep eval servers on the
+    # unfused path until the fused kernel is guarded or supports 7168.
+    export TRTLLM_MHC_ENABLE_FUSED_HC="${TRTLLM_MHC_ENABLE_FUSED_HC:-0}"
+    echo "TRTLLM_MHC_ENABLE_FUSED_HC: $TRTLLM_MHC_ENABLE_FUSED_HC"
+fi


plz enable it for everything before merging, in general, besides ctx length, there shouldnt be an if eval, then turn on the env var that fixes eval

Revert accidental trailing-space deletions on separator lines 1768 and 1889 in perf-changelog.yaml — CI depends on exact whitespace. Add rule to AGENTS.md to never delete whitespace in perf-changelog.yaml. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

github-actions · 2026-05-03T03:19:16Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25268432710
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25268432710

github-actions · 2026-05-03T03:23:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25268600739
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25268600739

github-actions · 2026-05-03T04:06:12Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25268675649
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25268675649

github-actions · 2026-05-03T04:40:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25270014160
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25270014160

github-actions · 2026-05-03T04:45:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25270043535
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25270043535

Add DSv4 TRT B200/300 test

ad9ac48

github-project-automation Bot added this to InferenceMAX Board Apr 30, 2026

Oseltamivir marked this pull request as ready for review April 30, 2026 00:37

Oseltamivir requested a review from a team April 30, 2026 00:37

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners April 30, 2026 00:37

Oseltamivir added the sweep-enabled label Apr 30, 2026

Merge remote-tracking branch 'origin/main' into b200-300-test

69720d2

# Conflicts: # .github/configs/nvidia-master.yaml # perf-changelog.yaml

claude Bot reviewed Apr 30, 2026

View reviewed changes

Oseltamivir and others added 10 commits April 29, 2026 19:12

fix: use TensorRT-LLM DeepSeek-V4 branch image

488ab3d

fix: point DeepSeek V4 image to correct org

e079fb7

Use runtime TensorRT-LLM DSv4 bootstrap

6a949a6

Merge branch 'main' into b200-300-test

0ae2019

Fix TensorRT-LLM DSv4 runtime wheel build

9488f34

Use DeepSeek V4 TRTLLM image

b0cc665

Use anonymous GHCR pulls by default

8ee56af

Fix DSv4 TRT launch env

2d48f08

Bypass mpirun for B300 DSv4 TRT

6e75819

larger sweep + mpi

e1e762d

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels Apr 30, 2026

Oseltamivir and others added 8 commits April 30, 2026 12:36

Merge branch 'main' into b200-300-test

58f7ac9

mpi

8220f0d

Merge branch 'main' into b200-300-test

79383bb

b200 perf ok

fb5d85f

OPAL

ef7f42c

sweep

5f409ed

Update TRTLLM DeepSeek-V4 image

2b96532

Merge main into b200-300-test

8875f8a

claude Bot reviewed May 2, 2026

View reviewed changes

ppleas

34d0bb3

Oseltamivir added 2 commits May 2, 2026 18:22

chore: clean up dsv4 trt diagnostics

b802428

Merge branch 'main' into b200-300-test

7fc41f7

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels May 3, 2026

not just evals

5da58a8

Oseltamivir added 2 commits May 2, 2026 20:02

Split B300 TRT recipe from B200

44f9c98

Combine stale output cleanup steps

f815f19

functionstackx requested changes May 3, 2026

View reviewed changes

functionstackx reviewed May 3, 2026

View reviewed changes

Comment thread perf-changelog.yaml

narrow down recipes

0c3054d

Merge branch 'main' into b200-300-test

1036fae

Oseltamivir changed the title ~~B200/300 test~~ DSv4 B300 TRT May 3, 2026

Update perf-changelog.yaml

5451927

Oseltamivir merged commit c7738cb into main May 3, 2026
11 of 37 checks passed

Oseltamivir deleted the b200-300-test branch May 3, 2026 04:44

xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026

DSv4 B300 TRT (SemiAnalysisAI#1233)

fbfba1e

Conversation

Oseltamivir commented Apr 30, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

functionstackx May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants