Add MiniMax M25 A8W8 blockscale GEMM tunings by akii96 · Pull Request #2979 · ROCm/aiter

akii96 · 2026-04-30T13:32:27Z

Adds MiniMax M25 A8W8 blockscale GEMM tuning entries and keeps the tuning table deduplicated and sorted

Pre-requirments for this to be merged:

Add SplitK support for CK/CKTile Block-Scale GEMMs #2541 (SplitK for CK/CKTile blockscale)

github-actions · 2026-04-30T13:33:18Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2979 --add-label <label>

akii96 · 2026-04-30T13:44:15Z

I checked now that the upstream aiter main has now already changed a bit and now takes an additional gfx_arch column as input.

This should be a small fix and I can address once I rebase and test it myself on workloads

akii96 · 2026-05-04T06:51:15Z

Latest updates

Rebased onto current origin/main
Re-tuned the full M25 shape set on MI355X (gfx950, cu_num=256, 8 GPUs)

Dependency on PR #2541

61 of the 6630 added rows (0.9%) carry splitK > 0. They were tuned on top of Add SplitK support for CK/CKTile Block-Scale GEMMs #2541 (SplitK for CK/CKTile blockscale), which wires KBatch = 1 << splitK through the production dispatch.
Without Add SplitK support for CK/CKTile Block-Scale GEMMs #2541: production hardcodes KBatch=1, so the 61 rows degrade to splitK=0 behavior

amd-yashagar · 2026-05-04T09:09:52Z

Looks good to me. Thank you @akii96.

…wMajor)

sunway513 · 2026-05-04T14:56:15Z

This PR's content was bulk-merged via #3004 ([Silo] Bulk merge: tuned GEMM and FMoE configs, merged 2026-05-02 03:16 UTC). Please close this PR as superseded.

Tracking issue: ROCm/AI-Frameworks-Dashboard#141

Squash-merged from main commit 52c4554. Includes 5 atomic Silo PRs: - #2923 GLM-4.7 FP8 tuned/untuned FMoE configs (new) - #2938 Kimi-K2.5 FP4 fused MoE tunings (TP2 / 256 CU refresh) - #2979 MiniMax-M2.5 A8W8 blockscale GEMM tunings - #2981 DeepSeek-V3.2 MI355X tuned GEMM and FMoE configs - #2982 MiniMax-M2.5 FMoE tunings Conflict in aiter/configs/model_configs/kimik2_fp4_tuned_fmoe.csv: two blocks resolved by taking theirs (Silo). Block 1 upgrades existing M=256/N=512 rows from base kernel suffixes (w3) to tuner-discovered variants (w3_xcd4, _bnt2_persist, _sbm32, _sbm64). Block 2 is purely additive: 30+ new rows for previously-uncovered N=7168/K=1024 shapes plus a flydsl_fallback section. Driver: vLLM 0.21 freeze 2026-05-08 — Silo customers need these tunings on the AITER release wheel, not nightly. Verification gate before tag: - Kernel suffix parser smoke (Kimi-K2.5-MXFP4 1-token inference, confirm new suffixes JIT-compile without falling back) - ATOM 5-model accuracy unchanged within +/- 0.005 vs v0.1.13-rc1 - Perf delta on Kimi-K2.5 / MiniMax-M2.5 / DSv3.2 (expect flat or better) (cherry picked from commit 52c4554)

akii96 · 2026-05-05T19:17:58Z

Merged with #3024

akii96 requested a review from a team April 30, 2026 13:32

akii96 marked this pull request as draft April 30, 2026 13:40

sunway513 mentioned this pull request May 1, 2026

[Silo] Bulk merge: tuned GEMM and FMoE configs (GLM-4.7, Kimi-K2.5, MiniMax-M2.5, DeepSeek-V3.2) #3004

Merged

2 tasks

akii96 force-pushed the gemm-tuning-minimax-m25-gfx950 branch from 95cf9fd to aa8d197 Compare May 4, 2026 06:39

akii96 marked this pull request as ready for review May 4, 2026 06:58

akii96 requested a review from amd-yashagar May 4, 2026 06:59

Add MiniMax M25 A8W8 blockscale GEMM tunings on gfx950 (splitK + AQRo…

4f555c5

…wMajor)

akii96 force-pushed the gemm-tuning-minimax-m25-gfx950 branch from aa8d197 to 4f555c5 Compare May 4, 2026 11:33

azaidy mentioned this pull request May 4, 2026

[Silo] Add configs missing from bulk merge #3004 #3024

Merged

2 tasks

akii96 marked this pull request as draft May 5, 2026 11:41

akii96 closed this May 5, 2026

akii96 deleted the gemm-tuning-minimax-m25-gfx950 branch May 5, 2026 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MiniMax M25 A8W8 blockscale GEMM tunings#2979

Add MiniMax M25 A8W8 blockscale GEMM tunings#2979
akii96 wants to merge 1 commit intomainfrom
gemm-tuning-minimax-m25-gfx950

akii96 commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

akii96 commented Apr 30, 2026 •

edited

Loading

Uh oh!

akii96 commented May 4, 2026

Uh oh!

amd-yashagar commented May 4, 2026

Uh oh!

sunway513 commented May 4, 2026

Uh oh!

akii96 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akii96 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

🏷️ CI Guide

Uh oh!

akii96 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akii96 commented May 4, 2026

Latest updates

Dependency on PR #2541

Uh oh!

amd-yashagar commented May 4, 2026

Uh oh!

sunway513 commented May 4, 2026

Uh oh!

akii96 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akii96 commented Apr 30, 2026 •

edited

Loading

akii96 commented Apr 30, 2026 •

edited

Loading