Skip to content

update configs 70b#95

Merged
cquil11 merged 1 commit intomainfrom
amd-70b-configs
Oct 13, 2025
Merged

update configs 70b#95
cquil11 merged 1 commit intomainfrom
amd-70b-configs

Conversation

@japarada
Copy link
Copy Markdown
Collaborator

@japarada japarada commented Oct 9, 2025

Update configs for 70b to include "compilation-config".
6-7% uplift for llama for 6/8 configs.

@japarada japarada requested a review from a team as a code owner October 9, 2025 23:18
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all tests passed. looks good

@cquil11 cquil11 merged commit eba0bcd into main Oct 13, 2025
7 of 8 checks passed
@cquil11 cquil11 deleted the amd-70b-configs branch October 13, 2025 18:07
@mgoin
Copy link
Copy Markdown

mgoin commented Oct 14, 2025

Hey @japarada is this a general heuristic we could add upstream to vLLM? If the custom ops aren't good for AMD, we should change the behavior

functionstackx added a commit that referenced this pull request May 3, 2026
AGENTS.md requires new perf-changelog entries to be appended to the end
of the file (oldest at top, newest at bottom). The original commit
prepended the new entry above PR #95; move it after the current last
entry (PR #1265) to satisfy the convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
functionstackx added a commit that referenced this pull request May 3, 2026
…1267)

* Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep)

- New `kimik2.5-int4-b300-vllm` config with the corresponding
  `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors
  the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5
  recipes page does not yet ship B300-specific tuning).
- Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (#1057,
  reverted in #1070, reopened as #1071) carried `v0.19.0` while we
  waited on a working release; 0.20.0 has now shipped.
- Search-space per (ISL, OSL): the existing TP=8 sweep plus a new
  TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant
  on the same B300 nodes.

Supersedes #1071 — opening fresh from main since the merge base had
drifted (b200 schema migrated from `seq-len-configs` to
`scenarios.fixed-seq-len`) and the user preferred a clean reopen
over a rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom

AGENTS.md requires new perf-changelog entries to be appended to the end
of the file (oldest at top, newest at bottom). The original commit
prepended the new entry above PR #95; move it after the current last
entry (PR #1265) to satisfy the convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026
…emiAnalysisAI#1267)

* Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep)

- New `kimik2.5-int4-b300-vllm` config with the corresponding
  `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors
  the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5
  recipes page does not yet ship B300-specific tuning).
- Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (SemiAnalysisAI#1057,
  reverted in SemiAnalysisAI#1070, reopened as SemiAnalysisAI#1071) carried `v0.19.0` while we
  waited on a working release; 0.20.0 has now shipped.
- Search-space per (ISL, OSL): the existing TP=8 sweep plus a new
  TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant
  on the same B300 nodes.

Supersedes SemiAnalysisAI#1071 — opening fresh from main since the merge base had
drifted (b200 schema migrated from `seq-len-configs` to
`scenarios.fixed-seq-len`) and the user preferred a clean reopen
over a rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom

AGENTS.md requires new perf-changelog entries to be appended to the end
of the file (oldest at top, newest at bottom). The original commit
prepended the new entry above PR SemiAnalysisAI#95; move it after the current last
entry (PR SemiAnalysisAI#1265) to satisfy the convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants