New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 #579

nsingh-habana · 2025-10-24T02:47:38Z

No description provided.

sanchitintel · 2025-10-25T00:34:33Z

include/cutlass/gemm/collective/xe_array_mma_fp8_legacy.hpp

+    Tensor tCrA_fp16 = make_fragment_like<half_t>(tCrA);
+    Tensor tCrB_fp16 = make_fragment_like<half_t>(tCrB);


@rolandschulz & @petercad, please advise whether such a redesign would make sense -

For FP8xFP8 GEMM on Xe2, FP8 is converted into FP16.
reorders in the new API allow multiple dtypes to share the same GEMM (MMA collectives) code. They're no-ops if dtype conversion is not needed. So, perhaps, they could share the same code? I could be wrong, but it seems this is (partly) what @petercad had in mind regarding the rearchitecture.

While it's indeed possible to use if constexpr with compile-time evaluated expressions to add multiple dtypes' GEMMs support in the same file, reorders seem to make things even simpler. It was earlier decided to choose readability & debuggability over reducing code duplication, which is why the legacy FP8xFP8 GEMM currently has a separate implementation with duplicated code.

@nsingh-habana, please also share your thoughts on it.

Thanks!

nsingh-habana marked this pull request as draft October 24, 2025 02:48

nsingh-habana changed the title ~~Integrate new mma_atoms and copy_atoms into bmg_grouped_gemm_fp8~~ New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 Oct 24, 2025

Integrate new mma_atoms and copy_atoms into bmg_grouped_gemm_fp8

1d6d2f7

nsingh-habana force-pushed the grouped_gemm_fp8_new_atoms branch from 0959d75 to 1d6d2f7 Compare October 24, 2025 08:39

sanchitintel reviewed Oct 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 #579

New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 #579

nsingh-habana commented Oct 24, 2025

Uh oh!

sanchitintel Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Tensor tCrA_fp16 = make_fragment_like<half_t>(tCrA);
		Tensor tCrB_fp16 = make_fragment_like<half_t>(tCrB);

New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 #579

Are you sure you want to change the base?

New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8 #579

Conversation

nsingh-habana commented Oct 24, 2025

Uh oh!

sanchitintel Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanchitintel Oct 25, 2025 •

edited

Loading