conv:tf32:add all CK instances by yingluAMD · Pull Request #2725 · ROCm/rocm-libraries

yingluAMD · 2025-11-18T06:27:03Z

Motivation

CK instances are all added on gfx942. Previous POC MIOpen PR(#1414) is merged also. This PR is to enable all CK instances in MIOpen algorithms, including forward/backward/wrw/scale/linear .etc.
As CK is bumped to the latest, gfx950 is enabled also.

Technical Details

Mainly change solvers to use TF32 instances. Add TF32 support and it could fallback to fp32 also.

Test Plan

Add several unit solver tests.

Test Result

pass.

gfx950 also run pass.

no undefined symbol is found on gfx950.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull Request Overview

This pull request adds comprehensive TF32 (TensorFloat-32) support to MIOpen's Composable Kernel (CK) instances for convolution operations on gfx942. The implementation enables TF32 compute type across all convolution directions (forward, backward, and weight gradient) for both 2D and 3D grouped operations.

Key Changes

Added TF32 template specializations to all group convolution solvers (Fwd/Bwd/Wrw for 2D and 3D)
Implemented TF32 compute type parameter throughout the CK device operation templates
Added test coverage for TF32 operations across all affected solvers

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupWrwXdlops.cpp`	New test file for 2D grouped weight gradient convolution with TF32 support
`projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupFwdXdlops.cpp`	New test file for 2D grouped forward convolution with TF32 support
`projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupBwdXdlops.cpp`	New test file for 2D grouped backward convolution with TF32 support
`projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemm3DGroupWrwXdlops.cpp`	New test file for 3D grouped weight gradient convolution with TF32 support
`projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemm3DGroupBwdXdlops.cpp`	New test file for 3D grouped backward convolution with TF32 support
`projects/miopen/test/gtest/unit_conv_solver.hpp`	Added type aliases for TF32 backward and weight gradient test fixtures
`projects/miopen/test/gtest/unit_conv_solver.cpp`	Updated verification calls to pass TF32 flag for tolerance adjustment
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_wrw_xdlops.cpp`	Extended 2D grouped WRW solver with TF32 compute type template parameter and conditional logic
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp`	Extended 2D grouped forward solver with TF32 compute type template parameter and conditional logic
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_bwd_xdlops.cpp`	Extended 2D grouped backward solver with TF32 compute type template parameter and conditional logic
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_wrw_xdlops.cpp`	Extended 3D grouped WRW solver with TF32 support including alpha/beta handling for bilinear/scale operations
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp`	Extended 3D grouped forward solver with TF32 support for bilinear/scale element-wise operations
`projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_bwd_xdlops.cpp`	Extended 3D grouped backward solver with TF32 support for bilinear/scale element-wise operations
`projects/miopen/src/ocl/convolutionocl.cpp`	Added SetupComputeType calls for all convolution directions to enable TF32 detection
`projects/miopen/src/include/miopen/solver/implicitgemm_ck_util.hpp`	Updated device operation templates and factory definitions to support TF32 compute type parameter
`projects/miopen/src/include/miopen/conv/solvers.hpp`	Added UseTF32() accessor methods and mutable use_tf32 flags to performance config structures
`projects/miopen/driver/conv_driver.hpp`	Modified tolerance calculation logic for TF32 math type handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BradPepersAMD · 2025-11-21T16:37:01Z

We need to consider the impact the TF32 changes have on our system DB as well as heuristics. If this has changed the key that will be generated for shapes, then it could have made both of those invalid and needing regeneration.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

BradPepersAMD · 2026-01-19T05:26:14Z

I believe our changes to the CK solvers is done now so if you can please resolve the conflicts, I can review, approve, and get this merged.

yingluAMD · 2026-01-19T06:13:33Z

Sure. @BradPepersAMD I resolved merge conflicts. We can wait the CI process.

* Enable xdl in gfx11 & gfx12 * update cmake file * fix all instance build (cmake) * fix batched_gemm_gemm(cmake) * rebase cmake files * fix cmake build error * remve CK_ENABLE_DYNAMIC_WARP_SIZE * update cmake build error2 * fix gfx11 build CK_USE_XDL is enabled on gfx11 and gfx12 * fix gfx10 build * fix gfx11 error --------- Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com> [ROCm/composable_kernel commit: f22740d]

conv:tf32:add all instances

49a55cb

github-actions Bot added the project: miopen label Nov 18, 2025

assistant-librarian Bot added the organization: ROCm label Nov 18, 2025

yingluAMD added 2 commits November 18, 2025 15:17

refact 3d grouped instances code

8e582f0

add 3dGrouped Bwd/Wrw unitests

23cabd1

yingluAMD self-assigned this Nov 18, 2025

add instances for conv grouped(f/b/w)

27dd838

yingluAMD marked this pull request as ready for review November 19, 2025 06:45

yingluAMD requested a review from a team as a code owner November 19, 2025 06:45

yingluAMD added 3 commits November 19, 2025 14:52

fix clang-format

ecdcbcb

fix clang-format

c6bd09e

fix clang-format

10944e4

yingluAMD requested review from BradPepersAMD, BrianHarrisonAMD, JonathanLichtnerAMD and Copilot November 19, 2025 08:06

Copilot started reviewing on behalf of yingluAMD November 19, 2025 08:14 View session

Copilot finished reviewing on behalf of yingluAMD November 19, 2025 08:15

Copilot AI reviewed Nov 19, 2025

View reviewed changes

yingluAMD added 2 commits November 19, 2025 16:53

fix word errors

3b0d76c

fix word errors

ab5012c

yingluAMD requested a review from Copilot November 19, 2025 13:41

Copilot started reviewing on behalf of yingluAMD November 19, 2025 13:48 View session

Copilot finished reviewing on behalf of yingluAMD November 19, 2025 13:50

Copilot AI reviewed Nov 19, 2025

View reviewed changes

yingluAMD added 3 commits November 20, 2025 14:20

disable ck bf16/tf32 cases on gfx90a

c2c6043

disable ck bf16/tf32 cases on gfx90a

0efc849

disable ck bf16/tf32 cases on gfx90a

d592d71

BradPepersAMD reviewed Nov 21, 2025

View reviewed changes

Comment thread projects/miopen/src/include/miopen/solver/implicitgemm_ck_util.hpp Outdated

BrianHarrisonAMD reviewed Dec 15, 2025

View reviewed changes

Comment thread projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemm3DGroupFwdXdlops.cpp

yingluAMD and others added 4 commits December 17, 2025 10:35

move IsTF32Supported to target properties.hpp

9ada135

Merge commit 'f0ecbb525ec' into tf32_instances

9c9de81

Merge branch 'develop' into tf32_instances

e35745e

change 3D test to new style

12393f2

yingluAMD requested review from Vsevolod1983 and Copilot December 22, 2025 03:47

fix clang-format

2339bd8

Copilot AI reviewed Dec 22, 2025

View reviewed changes

yingluAMD added 2 commits December 22, 2025 14:04

Merge branch 'develop' into tf32_instances

422ce39

decrease problem size to meet threshold

d74ecf1

yingluAMD requested review from BradPepersAMD and BrianHarrisonAMD December 22, 2025 09:44

yingluAMD added 6 commits December 22, 2025 21:45

refine 3DGroupWrw test cases

c9140ee

Merge branch 'develop' into tf32_instances

0c97f59

fix applicability

c1bda69

Merge branch 'develop' into tf32_instances

303e360

fix merge issues

8384fa5

Merge branch 'develop' into tf32_instances

2230980

yingluAMD added 2 commits January 19, 2026 13:33

Merge branch 'develop' into tf32_instances

c0c687a

fix merge issues

c381273

BradPepersAMD approved these changes Jan 19, 2026

View reviewed changes

yingluAMD merged commit 5b50b44 into develop Jan 19, 2026
56 of 60 checks passed

yingluAMD deleted the tf32_instances branch January 19, 2026 20:12

yingluAMD restored the tf32_instances branch January 22, 2026 02:08

yingluAMD deleted the tf32_instances branch January 22, 2026 03:01

yingluAMD mentioned this pull request Feb 3, 2026

[ROCM]enable TF32 config in MIOpen pytorch/pytorch#174169

Closed

Conversation

yingluAMD commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BradPepersAMD commented Nov 21, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

BradPepersAMD commented Jan 19, 2026

Uh oh!

yingluAMD commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yingluAMD commented Nov 18, 2025 •

edited

Loading