Skip to content

conv:tf32:add all CK instances#2725

Merged
yingluAMD merged 42 commits into
developfrom
tf32_instances
Jan 19, 2026
Merged

conv:tf32:add all CK instances#2725
yingluAMD merged 42 commits into
developfrom
tf32_instances

Conversation

@yingluAMD
Copy link
Copy Markdown
Contributor

@yingluAMD yingluAMD commented Nov 18, 2025

Motivation

CK instances are all added on gfx942. Previous POC MIOpen PR(#1414) is merged also. This PR is to enable all CK instances in MIOpen algorithms, including forward/backward/wrw/scale/linear .etc.
As CK is bumped to the latest, gfx950 is enabled also.

Technical Details

  • Mainly change solvers to use TF32 instances. Add TF32 support and it could fallback to fp32 also.

Test Plan

Add several unit solver tests.

Test Result

pass.

gfx950 also run pass.
image

no undefined symbol is found on gfx950.
image

Submission Checklist

@yingluAMD yingluAMD self-assigned this Nov 18, 2025
@yingluAMD yingluAMD marked this pull request as ready for review November 19, 2025 06:45
@yingluAMD yingluAMD requested a review from a team as a code owner November 19, 2025 06:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds comprehensive TF32 (TensorFloat-32) support to MIOpen's Composable Kernel (CK) instances for convolution operations on gfx942. The implementation enables TF32 compute type across all convolution directions (forward, backward, and weight gradient) for both 2D and 3D grouped operations.

Key Changes

  • Added TF32 template specializations to all group convolution solvers (Fwd/Bwd/Wrw for 2D and 3D)
  • Implemented TF32 compute type parameter throughout the CK device operation templates
  • Added test coverage for TF32 operations across all affected solvers

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupWrwXdlops.cpp New test file for 2D grouped weight gradient convolution with TF32 support
projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupFwdXdlops.cpp New test file for 2D grouped forward convolution with TF32 support
projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupBwdXdlops.cpp New test file for 2D grouped backward convolution with TF32 support
projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemm3DGroupWrwXdlops.cpp New test file for 3D grouped weight gradient convolution with TF32 support
projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemm3DGroupBwdXdlops.cpp New test file for 3D grouped backward convolution with TF32 support
projects/miopen/test/gtest/unit_conv_solver.hpp Added type aliases for TF32 backward and weight gradient test fixtures
projects/miopen/test/gtest/unit_conv_solver.cpp Updated verification calls to pass TF32 flag for tolerance adjustment
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_wrw_xdlops.cpp Extended 2D grouped WRW solver with TF32 compute type template parameter and conditional logic
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp Extended 2D grouped forward solver with TF32 compute type template parameter and conditional logic
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_grouped_bwd_xdlops.cpp Extended 2D grouped backward solver with TF32 compute type template parameter and conditional logic
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_wrw_xdlops.cpp Extended 3D grouped WRW solver with TF32 support including alpha/beta handling for bilinear/scale operations
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp Extended 3D grouped forward solver with TF32 support for bilinear/scale element-wise operations
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_bwd_xdlops.cpp Extended 3D grouped backward solver with TF32 support for bilinear/scale element-wise operations
projects/miopen/src/ocl/convolutionocl.cpp Added SetupComputeType calls for all convolution directions to enable TF32 detection
projects/miopen/src/include/miopen/solver/implicitgemm_ck_util.hpp Updated device operation templates and factory definitions to support TF32 compute type parameter
projects/miopen/src/include/miopen/conv/solvers.hpp Added UseTF32() accessor methods and mutable use_tf32 flags to performance config structures
projects/miopen/driver/conv_driver.hpp Modified tolerance calculation logic for TF32 math type handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/miopen/driver/conv_driver.hpp Outdated
Comment thread projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupWrwXdlops.cpp Outdated
Comment thread projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupBwdXdlops.cpp Outdated
Comment thread projects/miopen/test/gtest/unit_conv_solver_ConvHipImplicitGemmGroupFwdXdlops.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/miopen/src/include/miopen/conv/solvers.hpp
Comment thread projects/miopen/src/include/miopen/conv/solvers.hpp
Comment thread projects/miopen/src/include/miopen/conv/solvers.hpp
Comment thread projects/miopen/src/include/miopen/conv/solvers.hpp
Comment thread projects/miopen/src/include/miopen/conv/solvers.hpp
Comment thread projects/miopen/src/include/miopen/solver/implicitgemm_ck_util.hpp Outdated
@BradPepersAMD
Copy link
Copy Markdown
Contributor

We need to consider the impact the TF32 changes have on our system DB as well as heuristics. If this has changed the key that will be generated for shapes, then it could have made both of those invalid and needing regeneration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@BradPepersAMD
Copy link
Copy Markdown
Contributor

I believe our changes to the CK solvers is done now so if you can please resolve the conflicts, I can review, approve, and get this merged.

@yingluAMD
Copy link
Copy Markdown
Contributor Author

Sure. @BradPepersAMD I resolved merge conflicts. We can wait the CI process.

@yingluAMD yingluAMD merged commit 5b50b44 into develop Jan 19, 2026
56 of 60 checks passed
@yingluAMD yingluAMD deleted the tf32_instances branch January 19, 2026 20:12
@yingluAMD yingluAMD restored the tf32_instances branch January 22, 2026 02:08
@yingluAMD yingluAMD deleted the tf32_instances branch January 22, 2026 03:01
ammallya pushed a commit that referenced this pull request Feb 3, 2026
* Enable xdl in gfx11 & gfx12

* update cmake file

* fix all instance build (cmake)

* fix batched_gemm_gemm(cmake)

* rebase cmake files

* fix cmake build error

* remve CK_ENABLE_DYNAMIC_WARP_SIZE

* update cmake build error2

* fix gfx11 build

CK_USE_XDL is enabled on gfx11 and gfx12

* fix gfx10 build

* fix gfx11 error

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>

[ROCm/composable_kernel commit: f22740d]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants