Skip to content

Conversation

@muhammad-tanvir-1211
Copy link

This PR adds Grouped GEMM support for mixed precision GEMM.

@muhammad-tanvir-1211 muhammad-tanvir-1211 requested a review from a team July 4, 2025 15:59
Copy link

@jiyang1011 jiyang1011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdeng5 tdeng5 requested review from rolandschulz and taozha2 August 27, 2025 07:15
@taozha2
Copy link

taozha2 commented Aug 27, 2025

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

@jiyang1011
Copy link

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

I tried to figure out the difference and dispatch it to xe_mma_mixed_input.hpp. the biggest diff is to initialize the params : array mma must initial the tiled copy with individual tensor and update the group index, so it is not easy

@taozha2
Copy link

taozha2 commented Aug 27, 2025

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

I tried to figure out the difference and dispatch it to xe_mma_mixed_input.hpp. the biggest diff is to initialize the params : array mma must initial the tiled copy with individual tensor and update the group index, so it is not easy

the quantization and operator(gemm main loop) is same which is the most important part of the implementation, can we make a base struct like xe_mma_mixed_dtype_base contains these common part, and your grouped mixed gemm inherit it?

@jiyang1011
Copy link

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

can we refine the include/cutlass/gemm/collective/xe_array_mma_mixed_input.hpp and include/cutlass/gemm/collective/xe_mma_mixed_input.hpp together, i found they are many common code.

I tried to figure out the difference and dispatch it to xe_mma_mixed_input.hpp. the biggest diff is to initialize the params : array mma must initial the tiled copy with individual tensor and update the group index, so it is not easy

the quantization and operator(gemm main loop) is same which is the most important part of the implementation, can we make a base struct like xe_mma_mixed_dtype_base contains these common part, and your grouped mixed gemm inherit it?

True, But this method will involve a lot of files. I will provide another PR to deal with it

@jiyang1011 jiyang1011 merged commit 32e15ba into intel:sycl-develop Aug 29, 2025
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants