Skip to content

dsv4-b300-sglang: add conc=2048 recipe & MTP benchmark#1176

Closed
yhyang201 wants to merge 20 commits into
mainfrom
sglang-dsv4-MTP
Closed

dsv4-b300-sglang: add conc=2048 recipe & MTP benchmark#1176
yhyang201 wants to merge 20 commits into
mainfrom
sglang-dsv4-MTP

dsv4-b300-sglang: add conc=2048 8k1k recipe with finite request-rate

cea70e5
Select commit
Loading
Failed to load commit list.
Claude / Claude Code Review completed Apr 26, 2026 in 8m 13s

Code review found 3 potential issues

Found 5 candidates, confirmed 3. See review comments for details.

Details

Severity Count
🔴 Important 0
🟡 Nit 2
🟣 Pre-existing 0
Severity File:Line Issue
🟡 Nit benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh:119-122 MTP script hardcodes --swa-full-tokens-ratio 0.1, dropping the 1k1k tuning from the parent script
🟡 Nit perf-changelog.yaml:1879-1889 perf-changelog: pr-link points to wrong PR (#1166 instead of #1176)

Annotations

Check warning on line 122 in benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh

See this annotation in the file changed.

@claude claude / Claude Code Review

MTP script hardcodes --swa-full-tokens-ratio 0.1, dropping the 1k1k tuning from the parent script

The new MTP script hardcodes `--swa-full-tokens-ratio 0.1` at line 121, while the parent `dsv4_fp4_b300_sglang.sh` uses an ISL-conditional that picks 0.5 for ISL=1024 with the explicit comment that '0.5 was tuned empirically for the 1k1k recipe, while 0.1 is the cookbook default'. Since the MTP YAML exercises both 1024/1024 and 8192/1024, the 1k1k MTP run silently drops the empirical tuning — please either mirror the conditional or add a comment explaining why MTP intentionally diverges.

Check warning on line 1889 in perf-changelog.yaml

See this annotation in the file changed.

@claude claude / Claude Code Review

perf-changelog: pr-link points to wrong PR (#1166 instead of #1176)

The newly added `dsv4-fp4-b300-sglang-mtp` changelog entry sets `pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1166`, but this is PR #1176; PR #1166 is an unrelated open PR. Likely a digit transposition — please change the link to `/pull/1176` so the changelog correctly attributes the entry once merged.