Add B300 config: kimi-k2.5-int4-vllm by functionstackx · Pull Request #1057 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-17T08:37:13Z

Summary

Add kimik2.5-int4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_int4_b300.sh launch script
At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available
Image: vllm/vllm-openai:v0.15.1 (same as B200), runner: b300, same TP=8 and concurrency 4-64 search-space as B200

Test plan

CI config validation passes
Run kimik2.5-int4-b300-vllm single-node benchmark on a B300 node and confirm server starts, benchmark completes, and result file is produced

🤖 Generated with Claude Code

github-actions · 2026-04-17T08:37:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T08:37:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T08:37:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-17T08:47:30Z

+    - "Add Kimi-K2.5 INT4 B300 vLLM benchmark"
+    - "Image: vllm/vllm-openai:v0.15.1"
+    - "At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1057


🟡 The new kimik2.5-int4-b300-vllm entry in perf-changelog.yaml uses a placeholder pull/XXXX instead of the actual PR number. Please replace XXXX with 1057 before merging.

Extended reasoning...

The new perf-changelog.yaml entry added by this PR (line 1414) contains a placeholder URL: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. This placeholder was never replaced with the actual PR number, which is known at submission time to be 1057.

How it manifests: Any tooling or human reader that tries to follow the changelog link will land on a nonexistent GitHub URL, making it impossible to trace back what PR introduced the kimik2.5-int4-b300-vllm benchmark config.

Code path: The diff shows the entry was added at the bottom of perf-changelog.yaml with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. The author appears to have copied a template and forgot to substitute the PR number.

Why existing code doesn't prevent it: There is no CI validation enforcing that pr-link values contain a real PR number rather than a placeholder. The file is plain YAML with no schema enforcement on link format.

Impact: Low functional impact — the benchmark config itself is correct. However, the changelog entry becomes untraceable: downstream consumers, auditors, or developers reviewing history cannot click through to understand what changed, why the B200 recipe was reused, or who approved it. Changelog hygiene matters for a public benchmarking project.

Fix: Replace XXXX with 1057 on line 1414:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1057

Step-by-step proof:

PR Add B300 config: kimi-k2.5-int4-vllm #1057 is opened with title "Add B300 config: kimi-k2.5-int4-vllm".

The diff adds a new block to perf-changelog.yaml ending with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

Navigating to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX returns a 404/invalid URL — the page does not exist.

The correct URL https://github.com/SemiAnalysisAI/InferenceX/pull/1057 resolves to this very PR.

Note: seven other pre-existing entries in the file also use pull/XXX placeholders (lines 12, 19, 315, 790, 818, 855, 872), but those are pre-existing issues unrelated to this PR. This PR introduces one new instance of this pattern that is immediately fixable.

At the time of submission, the vLLM Kimi-K2.5 recipes page (https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html) does not have a B300-specific recipe, so this config reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Align with the standard B300 vLLM image used by other B300 vLLM configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 6e89e55.

…1070) [skip-sweep] This reverts commit 6e89e55.

…1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (#1057, reverted in #1070, reopened as #1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes #1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…emiAnalysisAI#1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (SemiAnalysisAI#1057, reverted in SemiAnalysisAI#1070, reopened as SemiAnalysisAI#1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes SemiAnalysisAI#1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR SemiAnalysisAI#95; move it after the current last entry (PR SemiAnalysisAI#1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team April 17, 2026 08:37

functionstackx requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 08:37

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

functionstackx added sweep-enabled and removed sweep-enabled labels Apr 17, 2026

functionstackx force-pushed the claude/add-kimi-k2.5-int4-b300-vllm branch from a5585b8 to 69b9cfb Compare April 17, 2026 13:14

functionstackx and others added 3 commits April 17, 2026 09:21

Fill in PR link for kimi-k2.5-int4-b300-vllm changelog entry

a6b3855

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bump B300 image to vllm/vllm-openai:v0.19.0-cu130

7119ead

Align with the standard B300 vLLM image used by other B300 vLLM configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the claude/add-kimi-k2.5-int4-b300-vllm branch from 69b9cfb to 7119ead Compare April 17, 2026 13:21

functionstackx merged commit 6e89e55 into main Apr 17, 2026
16 checks passed

functionstackx deleted the claude/add-kimi-k2.5-int4-b300-vllm branch April 17, 2026 13:22

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 17, 2026

cquil11 added a commit that referenced this pull request Apr 17, 2026

Revert "Add B300 config: kimi-k2.5-int4-vllm (#1057)" [skip-sweep]

6c20790

This reverts commit 6e89e55.

cquil11 mentioned this pull request Apr 17, 2026

Revert "Add B300 config: kimi-k2.5-int4-vllm (#1057)" [skip-sweep] #1070

Merged

cquil11 added a commit that referenced this pull request Apr 17, 2026

Revert "Add B300 config: kimi-k2.5-int4-vllm (#1057)" [skip-sweep] (#…

dd9c285

…1070) [skip-sweep] This reverts commit 6e89e55.

cquil11 mentioned this pull request Apr 17, 2026

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1071

Closed

2 tasks

functionstackx mentioned this pull request May 3, 2026

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1267

Merged

2 tasks

This was referenced May 3, 2026

Re-append kimik2.5-int4-b300-vllm changelog entry #1269

Merged

[AMD][MI35X] 0503 MI355x DSV4 #1275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: kimi-k2.5-int4-vllm#1057

Add B300 config: kimi-k2.5-int4-vllm#1057
functionstackx merged 3 commits intomainfrom
claude/add-kimi-k2.5-int4-b300-vllm

functionstackx commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant