update configs 70b by japarada · Pull Request #95 · SemiAnalysisAI/InferenceX

japarada · 2025-10-09T23:18:49Z

Update configs for 70b to include "compilation-config".
6-7% uplift for llama for 6/8 configs.

cquil11 · 2025-10-13T15:07:19Z

cquil11

all tests passed. looks good

mgoin · 2025-10-14T17:37:24Z

Hey @japarada is this a general heuristic we could add upstream to vLLM? If the custom ops aren't good for AMD, we should change the behavior

AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (#1057, reverted in #1070, reopened as #1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes #1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…emiAnalysisAI#1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (SemiAnalysisAI#1057, reverted in SemiAnalysisAI#1070, reopened as SemiAnalysisAI#1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes SemiAnalysisAI#1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR SemiAnalysisAI#95; move it after the current last entry (PR SemiAnalysisAI#1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

update configs 70b

9d33377

japarada requested review from functionstackx and kimbochen October 9, 2025 23:18

japarada requested a review from a team as a code owner October 9, 2025 23:18

cquil11 approved these changes Oct 13, 2025

View reviewed changes

cquil11 merged commit eba0bcd into main Oct 13, 2025
7 of 8 checks passed

cquil11 deleted the amd-70b-configs branch October 13, 2025 18:07

This was referenced Apr 24, 2026

Add DeepSeek-V4-Pro SGLang aggregated GB200 benchmarks (NVIDIA srt-slurm PR #69) #1137

Closed

[AMD] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8 #1243

Merged

claude Bot mentioned this pull request May 3, 2026

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1267

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update configs 70b#95

update configs 70b#95
cquil11 merged 1 commit intomainfrom
amd-70b-configs

japarada commented Oct 9, 2025

Uh oh!

cquil11 commented Oct 13, 2025 •

edited

Loading

Uh oh!

cquil11 left a comment

Uh oh!

Uh oh!

mgoin commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

japarada commented Oct 9, 2025

Uh oh!

cquil11 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgoin commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cquil11 commented Oct 13, 2025 •

edited

Loading