TF32 POC in Conv3d on MI30x platform by yingluAMD · Pull Request #2763 · ROCm/composable_kernel

yingluAMD · 2025-09-01T07:21:05Z

Proposed changes

Demonstrate TF32(XF32 in CDNA3 ISA) kernel in conv3d. Also add lots of instances for miopen.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

bartekxk · 2025-09-05T11:33:17Z

@@ -70,4 +70,3 @@ build*/
 __pycache__/



Yes. It seems a space is auto deleted by VSCode. Will try to recover it.

bartekxk · 2025-09-05T11:33:45Z

 }

-template <typename DataType>
+template <typename DataType, typename GemmType = DataType>


Can you change it to ComputeType to keep naming convention?

Sure. Use ComputeDataType to align with device_gemm_xdl_cshuffle_lds_direct_load.hpp#L61

bartekxk · 2025-09-05T11:34:41Z


+// use macro to minimize code change
+#ifndef EXAMPLE_WITH_GEMM_DATATYPE
+using GemmDataType = AccDataType;


ComputeType

bartekxk · 2025-09-05T11:35:00Z

 }

-template <typename DataType>
+template <typename DataType, typename GemmType = DataType>


Compute Type

bartekxk · 2025-09-05T11:36:19Z

+#ifndef EXAMPLE_WITH_GEMM_DATATYPE
+using GemmDataType = AccDataType;
+#endif


ComputeDataType

bartekxk · 2025-09-05T11:44:03Z

          typename CElementwiseOperation,
-          typename ComputeTypeA = CDataType,
-          typename ComputeTypeB = ComputeTypeA>
+          typename ComputeTypeA    = CDataType,


bartekxk · 2025-09-05T11:44:36Z

+          typename DsLayout,
+          typename ELayout,
+          ConvolutionForwardSpecialization ConvSpec>
+using device_grouped_conv_fwd_xdl_dynamic_op_f32_tf32_instances = std::tuple<


We probably dont need dynamic op instances since it has not been integrated with MIOpen

bartekxk · 2025-09-05T11:45:57Z

                add_device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instances(
                    op_ptrs);
            }
+            if constexpr(is_same_v<InDataType, float> && is_same_v<WeiDataType, float> &&


Do we need something like CK_ENABLE_TF32?

CK API use different template params ComputeDataTypeA/B to distinguish tf32 or fp32 compute. No incorrect usage will occur.
While MIOpen use MIOPEN_TF32_OVERRIDE (vs NVIDIA_TF32_OVERRIDE) to disable TF32 mode which means MIOpen will select different CK kernel. That should be enough.

bartekxk · 2025-09-05T11:48:34Z

+namespace ck {
+namespace tensor_operation {
+namespace device {
+namespace instance {


Plese dont extend gndhwc layout since it is not used widely

linqun

looks good to me.

aska-0096 · 2025-09-12T05:56:22Z

Here a/b_thread_buf define in which format data stored in the register, wondering if define TF32 in register will introduce type convert overhead instead of cutting-off by MFMA instruction. Have you checked the dumped ISA?

Checked on ISA and other public docs. FP32 is automatically trucked to TF32 in matrix core. Explicit data convert is not needed. We can refer to Nvidia TF32 introduction also(Link) which use a picture show the workflow.

aska-0096 · 2025-09-12T05:57:42Z

Here we define the data in a_block_buf is FloatA and in a_thread_buf is ElementDataTypeA, is type convert overhead introduced?

Okay.. ElementDataTypeA still float here, tf32 only used to select the correct tf32 mfma

aska-0096 · 2025-09-12T06:05:51Z

it looks good me. It fix/enable several tf32 infra in ck, while implement a kernel with f32 input and tf32 computation to benefit the higher math rate. Let's wait CI pass and @bartekxk 's review resolved

bartekxk

LGTM , just one last comment

bartekxk · 2025-09-15T06:46:40Z

Pls Dont print any message without:
if(ck::EnvIsEnabled(CK_ENV(CK_LOGGING)))

Fixed. Add if(ck::EnvIsEnabled(CK_ENV(CK_LOGGING))) before print error message.

bartekxk · 2025-09-15T08:56:24Z

You missed this place

This reverts commit c511021.

…2848) This reverts commit c511021.

…2763)" (#2848)" This reverts commit 03b59f8.

* Revert "Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#2848)" This reverts commit 03b59f8. * fix compile error on gf12x * only run tf32 example on gfx942 * only build tf32 instance on gfx942 * ckProfiler:only support tf32 in gfx942 * delete unuseful messages

…2848) This reverts commit c511021.

* Revert "Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#2848)" This reverts commit 03b59f8. * fix compile error on gf12x * only run tf32 example on gfx942 * only build tf32 instance on gfx942 * ckProfiler:only support tf32 in gfx942 * delete unuseful messages

## Motivation gfx942 series support TF32 in matrix core natively. While TF32 is not supported in MIOpen now. This PR is a POC of enabling TF32. ## Technical Details All kernel is changed in CK(PR:[2763](ROCm/composable_kernel#2763)). Below are the changes in miopen: - Change problem and kernel instance invoker to invoker TF32 kernel in CK. - Add environment to control whether use TF32.

yingluAMD requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners September 1, 2025 07:21

yingluAMD mentioned this pull request Sep 1, 2025

MIOpen:feature:tf32:demonstrate tf32 in conv3d on MI30X platform ROCm/rocm-libraries#1414

Merged

1 task

yingluAMD force-pushed the xf32_0814 branch 6 times, most recently from 57c955b to e5492d0 Compare September 5, 2025 09:48

linqun reviewed Sep 5, 2025

View reviewed changes

Comment thread ...de/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp

Comment thread example/01_gemm/gemm_xdl_lds_direct_load_fp32_tf32.cpp

bartekxk reviewed Sep 5, 2025

View reviewed changes

yingluAMD requested review from bartekxk and linqun September 8, 2025 02:00

linqun reviewed Sep 9, 2025

View reviewed changes

yingluAMD self-assigned this Sep 9, 2025

yingluAMD force-pushed the xf32_0814 branch from cb3704d to ff01d3c Compare September 11, 2025 02:38

yingluAMD force-pushed the xf32_0814 branch 2 times, most recently from 4c9b427 to f986e71 Compare September 11, 2025 03:20

illsilin requested a review from aska-0096 as a code owner September 12, 2025 02:59

aska-0096 reviewed Sep 12, 2025

View reviewed changes

yingluAMD force-pushed the xf32_0814 branch from e0cc247 to c278418 Compare September 15, 2025 02:08

yingluAMD requested a review from cgmillette as a code owner September 15, 2025 02:08

bartekxk reviewed Sep 15, 2025

View reviewed changes

yingluAMD force-pushed the xf32_0814 branch from c278418 to a66fda8 Compare September 15, 2025 06:56

bartekxk reviewed Sep 15, 2025

View reviewed changes

feature:tf32:add initial conv3d fwd kernel support

0e31ed7

yingluAMD force-pushed the xf32_0814 branch from a66fda8 to 0e31ed7 Compare September 15, 2025 09:04

bartekxk approved these changes Sep 15, 2025

View reviewed changes

yingluAMD merged commit c511021 into ROCm:develop Sep 15, 2025
41 of 46 checks passed

illsilin added a commit that referenced this pull request Sep 15, 2025

Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)"

cd1dd06

This reverts commit c511021.

illsilin mentioned this pull request Sep 15, 2025

Revert "TF32 POC in Conv3d on MI30x platform" #2848

Merged

illsilin added a commit that referenced this pull request Sep 15, 2025

Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#…

03b59f8

…2848) This reverts commit c511021.

yingluAMD added a commit that referenced this pull request Sep 16, 2025

Revert "Revert "feature:tf32:add initial conv3d fwd kernel support (#…

da90a61

…2763)" (#2848)" This reverts commit 03b59f8.

afagaj mentioned this pull request Sep 17, 2025

TF32 POC in Conv3d on MI30x platform #2763 (second attempt) #2852

Merged

7 tasks

AviralGoelAMD pushed a commit that referenced this pull request Oct 16, 2025

feature:tf32:add initial conv3d fwd kernel support (#2763)

9f8a63a

AviralGoelAMD pushed a commit that referenced this pull request Oct 16, 2025

Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#…

c93151a

…2848) This reverts commit c511021.

Conversation

yingluAMD commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yingluAMD Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linqun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aska-0096 commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartekxk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yingluAMD commented Sep 1, 2025 •

edited

Loading

yingluAMD Sep 6, 2025 •

edited

Loading

aska-0096 commented Sep 12, 2025 •

edited

Loading