Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR introduces CUTLASS gemm kernels, groupwise-scaled gemm kernels and group gemm kernels for Blackwell GPUs.

Files are reorganized a bit so that the exposed global functions are now architecture agnostic. Prior to this PR, our global function names for CUTLASS kernels usually end with "_sm90", which brings extra complexity when the frontend compiler decides to dispatch kernels when there are multiple supported architectures, such as Hopper and Blackwell.

Therefore, this PR renames those global function so that the function names are arch agnostic. During the build time, only the kernels that the specific architecture supports will be built.

@MasterJH5574
Copy link
Contributor Author

@tvm-bot rerun

2 similar comments
@MasterJH5574
Copy link
Contributor Author

@tvm-bot rerun

@MasterJH5574
Copy link
Contributor Author

@tvm-bot rerun

@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2025-06-02-cutlass-blackwell branch 2 times, most recently from af75beb to 4dd1743 Compare June 6, 2025 03:43
This PR introduces CUTLASS gemm kernels, groupwise-scaled gemm
kernels and group gemm kernels for Blackwell GPUs.

Files are reorganized a bit so that the exposed global functions
are now architecture agnostic.  Prior to this PR, our global
function names for CUTLASS kernels usually end with `"_sm90"`,
which brings extra complexity when the frontend compiler decides
to dispatch kernels when there are multiple supported architectures,
such as Hopper and Blackwell.

Therefore, this PR renames those global function so that the
function names are arch agnostic. During the build time, only
the kernels that the specific architecture supports will be built.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2025-06-02-cutlass-blackwell branch from 4dd1743 to 5f8598e Compare June 6, 2025 03:52
@tqchen tqchen merged commit fd9c091 into apache:main Jun 6, 2025
12 checks passed
MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Jun 16, 2025
The cutlass kernel build on Hopper GPU was broken since apache#18033.
This PR fixes the issue.
tqchen pushed a commit that referenced this pull request Jun 17, 2025
The cutlass kernel build on Hopper GPU was broken since #18033.
This PR fixes the issue.
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
This PR introduces CUTLASS gemm kernels, groupwise-scaled gemm
kernels and group gemm kernels for Blackwell GPUs.

Files are reorganized a bit so that the exposed global functions
are now architecture agnostic.  Prior to this PR, our global
function names for CUTLASS kernels usually end with `"_sm90"`,
which brings extra complexity when the frontend compiler decides
to dispatch kernels when there are multiple supported architectures,
such as Hopper and Blackwell.

Therefore, this PR renames those global function so that the
function names are arch agnostic. During the build time, only
the kernels that the specific architecture supports will be built.
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
The cutlass kernel build on Hopper GPU was broken since apache#18033.
This PR fixes the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants