Skip to content

Conversation

Xiashangning
Copy link

Found some bugs and typos in the source code found during running the tests.

algo=wmma_implicit_gemm failed on A100 with pytorch 2.4.1 and cuda 12.1.

algo=cutlass_implicit_gemm returns -1 during benchmarking.

therefore, only explicit and implicit works on my side.

PS: the test code seems to be not up to date. Could you please have a look and maybe update it according the latest source code?

@chrischoy
Copy link
Collaborator

The wmma is not supported for fp32/fp64. Cutlass also does not support either but I convert them to fp16/bf16 in the kernel. Also, cutlass kernel can only run multiple of 16 channels or configurations allowed by engine cutlass engine so more restrictive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants