Skip to content

Conversation

@ylyzty
Copy link
Contributor

@ylyzty ylyzty commented Nov 26, 2025

PR Category

OP Test

Type of Change

New Feature

Description

  • Update mm autotune configs
  • Add an if branch to use tma for cases where M, N, K are divisible by BLOCK_[M | N | K]

Instructions

The current version of the tma_gemm_kernel code only supports row-major matrices because make_tensor_descriptor requires the last dimension to be continuous.

If you want to use tma in column-major matrix, for example:

  • A = torch.randn(M, K)
  • B = torch.randn(N, K)

You need to update the mm_kernel_general like this:

# modify b_desc
b_desc = tl.make_tensor_descriptor(
    B,
    shape = [N, K],
    strides = [K, 1],
    block_shape = [BLOCK_N, BLOCK_K],
)

for k in range(0, tl.cdiv(K, BLOCK_K)):
    a = a_desc.load([offset_am.to(tl.int32), offset_k.to(tl.int32)])
    b = b_desc.load([offset_bn.to(tl.int32), offset_k.to(tl.int32)])     # adjust the loaded data
    acc += tl.dot(a, b.trans(), out_dtype=tl.float32, allow_tf32=False)  # use b.trans() to dot
    offset_k += BLOCK_K

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ylyzty
❌ Galaxy1458
You have signed the CLA already but the status is still pending? Let us recheck it.

@kiddyjinjin kiddyjinjin changed the title Update mm kernel and tune configs 【should not merge now】Update mm kernel and tune configs Nov 26, 2025
@ylyzty ylyzty force-pushed the update_gemm_kernel_and_tune_configs branch from 23ac256 to c5a35e1 Compare November 30, 2025 13:01
@Galaxy1458 Galaxy1458 changed the title 【should not merge now】Update mm kernel and tune configs 【Hopper】Update mm kernel and tune configs Dec 1, 2025
Copy link
Collaborator

@Galaxy1458 Galaxy1458 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zhzhcookie zhzhcookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Galaxy1458 Galaxy1458 merged commit b5bb6e2 into flagos-ai:master Dec 5, 2025
10 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants