TF32 POC in Conv3d on MI30x platform#2763
Conversation
57c955b to
e5492d0
Compare
| @@ -70,4 +70,3 @@ build*/ | |||
| __pycache__/ | |||
|
|
|||
There was a problem hiding this comment.
Yes. It seems a space is auto deleted by VSCode. Will try to recover it.
| } | ||
|
|
||
| template <typename DataType> | ||
| template <typename DataType, typename GemmType = DataType> |
There was a problem hiding this comment.
Can you change it to ComputeType to keep naming convention?
There was a problem hiding this comment.
Sure. Use ComputeDataType to align with device_gemm_xdl_cshuffle_lds_direct_load.hpp#L61
|
|
||
| // use macro to minimize code change | ||
| #ifndef EXAMPLE_WITH_GEMM_DATATYPE | ||
| using GemmDataType = AccDataType; |
| } | ||
|
|
||
| template <typename DataType> | ||
| template <typename DataType, typename GemmType = DataType> |
| #ifndef EXAMPLE_WITH_GEMM_DATATYPE | ||
| using GemmDataType = AccDataType; | ||
| #endif |
| typename CElementwiseOperation, | ||
| typename ComputeTypeA = CDataType, | ||
| typename ComputeTypeB = ComputeTypeA> | ||
| typename ComputeTypeA = CDataType, |
| typename DsLayout, | ||
| typename ELayout, | ||
| ConvolutionForwardSpecialization ConvSpec> | ||
| using device_grouped_conv_fwd_xdl_dynamic_op_f32_tf32_instances = std::tuple< |
There was a problem hiding this comment.
We probably dont need dynamic op instances since it has not been integrated with MIOpen
| add_device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instances( | ||
| op_ptrs); | ||
| } | ||
| if constexpr(is_same_v<InDataType, float> && is_same_v<WeiDataType, float> && |
There was a problem hiding this comment.
Do we need something like CK_ENABLE_TF32?
There was a problem hiding this comment.
CK API use different template params ComputeDataTypeA/B to distinguish tf32 or fp32 compute. No incorrect usage will occur.
While MIOpen use MIOPEN_TF32_OVERRIDE (vs NVIDIA_TF32_OVERRIDE) to disable TF32 mode which means MIOpen will select different CK kernel. That should be enough.
| namespace ck { | ||
| namespace tensor_operation { | ||
| namespace device { | ||
| namespace instance { |
There was a problem hiding this comment.
Plese dont extend gndhwc layout since it is not used widely
cb3704d to
ff01d3c
Compare
4c9b427 to
f986e71
Compare
There was a problem hiding this comment.
Here a/b_thread_buf define in which format data stored in the register, wondering if define TF32 in register will introduce type convert overhead instead of cutting-off by MFMA instruction. Have you checked the dumped ISA?
There was a problem hiding this comment.
Checked on ISA and other public docs. FP32 is automatically trucked to TF32 in matrix core. Explicit data convert is not needed. We can refer to Nvidia TF32 introduction also(Link) which use a picture show the workflow.
There was a problem hiding this comment.
Here we define the data in a_block_buf is FloatA and in a_thread_buf is ElementDataTypeA, is type convert overhead introduced?
There was a problem hiding this comment.
Okay.. ElementDataTypeA still float here, tf32 only used to select the correct tf32 mfma
|
it looks good me. It fix/enable several tf32 infra in ck, while implement a kernel with f32 input and tf32 computation to benefit the higher math rate. Let's wait CI pass and @bartekxk 's review resolved |
e0cc247 to
c278418
Compare
bartekxk
left a comment
There was a problem hiding this comment.
LGTM , just one last comment
There was a problem hiding this comment.
Pls Dont print any message without:
if(ck::EnvIsEnabled(CK_ENV(CK_LOGGING)))
There was a problem hiding this comment.
Fixed. Add if(ck::EnvIsEnabled(CK_ENV(CK_LOGGING))) before print error message.
c278418 to
a66fda8
Compare
a66fda8 to
0e31ed7
Compare
## Motivation gfx942 series support TF32 in matrix core natively. While TF32 is not supported in MIOpen now. This PR is a POC of enabling TF32. ## Technical Details All kernel is changed in CK(PR:[2763](ROCm/composable_kernel#2763)). Below are the changes in miopen: - Change problem and kernel instance invoker to invoker TF32 kernel in CK. - Add environment to control whether use TF32.
Proposed changes
Demonstrate TF32(XF32 in CDNA3 ISA) kernel in conv3d. Also add lots of instances for miopen.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered