Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2707,7 +2707,7 @@ dsv4-fp4-b300-vllm:
- { tp: 8, ep: 8, dp-attn: true, conc-start: 2048, conc-end: 2048 }

dsv4-fp4-b300-trt:
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-4999884
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: b300
Expand Down
7 changes: 2 additions & 5 deletions benchmarks/single_node/dsv4_fp4_b300_trt.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

# DeepSeek-V4-Pro single-node TRTLLM recipe for B300. The configured image
# already contains NVIDIA/TensorRT-LLM@feat/deepseek_v4; do not build TRTLLM at
# already contains a TensorRT-LLM DeepSeek-V4 build; do not build TRTLLM at
# runtime from this benchmark path.

source "$(dirname "$0")/../benchmark_lib.sh"
Expand Down Expand Up @@ -101,10 +101,7 @@ if [ "${EVAL_ONLY}" = "true" ]; then
MAX_NUM_TOKENS="$EVAL_MAX_MODEL_LEN"
fi

# DeepSeek-V4-Pro has hidden size 7168. The current TRTLLM fused-HC MHC
# path corrupts eval generations for this shape; keep eval servers on the
# unfused path until the fused kernel is guarded or supports 7168.
export TRTLLM_MHC_ENABLE_FUSED_HC=0
export TRTLLM_MHC_ENABLE_FUSED_HC="${TRTLLM_MHC_ENABLE_FUSED_HC:-1}"
echo "TRTLLM_MHC_ENABLE_FUSED_HC: $TRTLLM_MHC_ENABLE_FUSED_HC"

start_gpu_monitor --output "$PWD/gpu_metrics.csv"
Expand Down
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2148,3 +2148,10 @@
- "Disable TRTLLM fused MHC hyper-connection for eval servers via TRTLLM_MHC_ENABLE_FUSED_HC=0 because the current fused kernel corrupts DeepSeek-V4-Pro hidden size 7168 generations"
- "Keep this as eval-only PR validation until the TensorRT-LLM fused MHC kernel is guarded or supports hidden size 7168"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233

- config-keys:
- dsv4-fp4-b300-trt
description:
- "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:fix-mhc7168-eb20e9e"
- "Enable TRTLLM fused MHC by default now that the image includes the hidden-size 7168 fused-HC fix"
pr-link: TBD

Check warning on line 2157 in perf-changelog.yaml

View check run for this annotation

Claude / Claude Code Review

perf-changelog.yaml entry uses placeholder pr-link: TBD

The new perf-changelog.yaml entry uses 'pr-link: TBD' instead of a real URL. Every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL, and the PR number (1270) is already known — please replace TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1270 before merge. Validation accepts any string so CI passes, but the placeholder degrades traceability for downstream tooling.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry uses 'pr-link: TBD' instead of a real URL. Every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL, and the PR number (1270) is already known — please replace TBD with #1270 before merge. Validation accepts any string so CI passes, but the placeholder degrades traceability for downstream tooling.

Extended reasoning...

What this is: The new perf-changelog.yaml entry added at line 2157 has pr-link: TBD as a placeholder rather than a real PR URL.

Why it stands out: A grep confirms this is the only pr-link: TBD in the entire perf-changelog.yaml — every other entry uses a fully-qualified https://github.com/SemiAnalysisAI/InferenceX/pull/NNN URL. The PR number is already known (#1270), so this is a straightforward fill-in. AGENTS.md documents the convention of using a placeholder while the PR is being prepared and replacing it with the real URL once the PR exists.

Why it does not break CI: utils/matrix_logic/validation.py:483 declares pr_link: str = Field(alias="pr-link") — a plain string with no URL pattern validation. process_changelog.py imports ChangelogEntry but never references the pr_link field itself for sweep generation, so nothing breaks at runtime. That is why this is a nit (traceability/process oversight) rather than a normal bug.

Impact: Downstream tooling and humans reading perf-changelog.yaml lose the link back to the PR that introduced this image bump. Anyone investigating a regression on dsv4-fp4-b300-trt to the fix-mhc7168-eb20e9e image (or trying to understand why fused MHC was re-enabled by default) cannot click through to the discussion and diff.

Step-by-step proof:

  1. Open perf-changelog.yaml line 2157: the new entry's last line is pr-link: TBD.
  2. Search the file for other entries — all use pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/N (e.g. line 2151 immediately above is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1233).
  3. Confirm the PR number: this PR is Update DSV4 TRT fused MHC image #1270 (per the PR metadata).
  4. Confirm CI won't fail: utils/matrix_logic/validation.py:483 defines pr_link as a plain str, with no HttpUrl or regex constraint.

Fix: Replace pr-link: TBD on line 2157 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1270 before merge.

Loading