【Hopper】Update mm kernel and tune configs #1104

ylyzty · 2025-11-26T08:53:07Z

PR Category

OP Test

Type of Change

New Feature

Description

Update mm autotune configs
Add an if branch to use tma for cases where M, N, K are divisible by BLOCK_[M | N | K]

Instructions

The current version of the tma_gemm_kernel code only supports row-major matrices because make_tensor_descriptor requires the last dimension to be continuous.

If you want to use tma in column-major matrix, for example:

A = torch.randn(M, K)
B = torch.randn(N, K)

You need to update the mm_kernel_general like this:

# modify b_desc
b_desc = tl.make_tensor_descriptor(
    B,
    shape = [N, K],
    strides = [K, 1],
    block_shape = [BLOCK_N, BLOCK_K],
)

for k in range(0, tl.cdiv(K, BLOCK_K)):
    a = a_desc.load([offset_am.to(tl.int32), offset_k.to(tl.int32)])
    b = b_desc.load([offset_bn.to(tl.int32), offset_k.to(tl.int32)])     # adjust the loaded data
    acc += tl.dot(a, b.trans(), out_dtype=tl.float32, allow_tf32=False)  # use b.trans() to dot
    offset_k += BLOCK_K

CLAassistant · 2025-11-26T08:53:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ylyzty
❌ Galaxy1458
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Signed-off-by: Galaxy1458 <[email protected]>

Galaxy1458

LGTM

zhzhcookie

LGTM

kiddyjinjin changed the title ~~Update mm kernel and tune configs~~ 【should not merge now】Update mm kernel and tune configs Nov 26, 2025

ylyzty added 3 commits November 30, 2025 20:58

add mm configs

c984664

tma gemm kernel

e8706c6

adjust the location of mm with tma implementation

c5a35e1

ylyzty force-pushed the update_gemm_kernel_and_tune_configs branch from 23ac256 to c5a35e1 Compare November 30, 2025 13:01

Galaxy1458 changed the title ~~【should not merge now】Update mm kernel and tune configs~~ 【Hopper】Update mm kernel and tune configs Dec 1, 2025

ylyzty and others added 3 commits December 3, 2025 08:43

fix ci

4456b13

Update imports based on Triton version

d3992c4

Signed-off-by: Galaxy1458 <[email protected]>

Update import statement for mm with noqa comment

5ef1703

Signed-off-by: Galaxy1458 <[email protected]>

Galaxy1458 approved these changes Dec 5, 2025

View reviewed changes

zhzhcookie approved these changes Dec 5, 2025

View reviewed changes

Galaxy1458 merged commit b5bb6e2 into flagos-ai:master Dec 5, 2025
10 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Hopper】Update mm kernel and tune configs #1104

【Hopper】Update mm kernel and tune configs #1104

ylyzty commented Nov 26, 2025

Uh oh!

CLAassistant commented Nov 26, 2025 •

edited

Loading

Uh oh!

Galaxy1458 left a comment

Uh oh!

zhzhcookie left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

【Hopper】Update mm kernel and tune configs #1104

【Hopper】Update mm kernel and tune configs #1104

Conversation

ylyzty commented Nov 26, 2025

PR Category

Type of Change

Description

Instructions

Uh oh!

CLAassistant commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Galaxy1458 left a comment

Choose a reason for hiding this comment

Uh oh!

zhzhcookie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Nov 26, 2025 •

edited

Loading