Skip to content

[WIP][ROCm] Add AITER hipblaslt preshuffled gemm kernel#29981

Draft
kliuae wants to merge 3 commits intovllm-project:mainfrom
EmbeddedLLM:upstream-add-hipb-mm-bpreshuffle
Draft

[WIP][ROCm] Add AITER hipblaslt preshuffled gemm kernel#29981
kliuae wants to merge 3 commits intovllm-project:mainfrom
EmbeddedLLM:upstream-add-hipb-mm-bpreshuffle

Conversation

@kliuae
Copy link
Contributor

@kliuae kliuae commented Dec 3, 2025

Purpose

This PR adds weight-shuffled gemm kernel from aiter for ROCm.
The kernel supports weight shuffling for both quantized and unquantized gemm, and in this PR unquantized linear and PTPC GEMMs are targeted.
Note: In #28837 the post-processing of FP8 weights and the linear weights will be refactored, and this PR will adapt to the refactoring after it gets merged.

Test Plan

lm_eval accuracy test and benchmarking of PTPC-FP8 models.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
@mergify mergify bot added the rocm Related to AMD ROCm label Dec 3, 2025
@mergify
Copy link

mergify bot commented Dec 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kliuae.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant