3D conv heuristics (KTN part) by amd-bartgips · Pull Request #3918 · ROCm/MIOpen

amd-bartgips · 2025-07-28T08:40:15Z

Proposed changes

This PR is focussed on introducing a new feature: A new model to perform heuristics for tuning the CK kernels used for 3D convolutions, initially only for gfx942:

gfx942_ConvHipImplicitGemm3DGroupWrwXdlops
gfx942_ConvHipImplicitGemm3DGroupBwdXdlops
gfx942_ConvHipImplicitGemm3DGroupFwdXdlops

These models are different from the current implementation of KTN:
It does not use an LSTM architecture, but instead relies on two TunaNet models:

input_encoder.tn.model: projects the input (conv parameters extracted from fdb_key) to latent space
kernel_config_encoder.tn.model: projects all possible kernel configurations (kernel parameters + split_k) for a particular fdb_key to the same latent space.

the kernel configs are then scored through a dot product with the latent representaiton of the input, and the kernel_config with the highest score is selected

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added automated tests relevant to the introduced functionality
I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR
I have ran the tests, and they are all passing locally
I have added relevant documentation for the changes
I have removed the stale documentation which is no longer relevant after this pull request
I have ran make format & make check_format to ensure the changes have been formatted

TODO: * check if metadata is correct. * Implement relevant cpp functions to load models and run them

…date kernel config

…r submodel

… the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver

… (encoding + dropping constants)

* Forward declaration of EncodeKernelParams.

Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)

amd-bartgips · 2025-07-28T13:44:45Z

Based on only reading the code and fixing any issues that came up when building MIOpen, we are ready for the next steps:

make sure 3d Tunanet works, so we get routed to use the relevant CK kernels
test with MIOpenDriver
Note that for now we only have implemented things for gfx942_ConvHipImplicitGemm3DGroupWrwXdlops so we should test a 3D conv command with --forw 4 .
expand this code to cover the other two directions (bwd, fwd)
Note that these three kernels can probably share a lot of code (so make sure to not simply copy redundant stuff for all three solvers)

…e for easier reuse across other 3D conv solvers

amd-bartgips added 17 commits July 25, 2025 09:44

Added first attempt at fdeep versions of 3d ktn models.

49a0bbb

TODO: * check if metadata is correct. * Implement relevant cpp functions to load models and run them

Added co-pilot generated new model class for selecting the best candi…

d015559

…date kernel config

performed dot product directly in cpp code, removed candidate_selecto…

289a1f2

…r submodel

Added the machinery to use the new candidate selection heuristics for…

2c155cc

… the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver

added split_k functionality

10722a3

refactored by moving some helper functions outside the main function

fe19eb9

improved loading of metadata for candidate selection model

2e000f3

split off own metadata class to keep it distinct from legacy version

dbf021b

Added new versions of CS models (+new metadata)

11ec8d5

Added methods for preprocessing input and kernel_convigs for CS model…

05871e3

… (encoding + dropping constants)

fixed naming of CandidateSelectionMetadata and Ptrs variables

b007d9e

Added new model and metadata class to header file

7574ca7

removed unused function

e795096

removed duplicate definition of CandidateSelectionMetadata

c0a95a8

* altered SelectBestCandidate to avoid errors.

b89cae2

* Forward declaration of EncodeKernelParams.

removed unused function

b09b18f

Build now works.

cb846ec

Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)

amd-bartgips added 4 commits July 29, 2025 09:00

moved new candidateSelection code to its own files (and sub namespace)

efeece1

refactored general 3D conv kernel tuning functions into their own fil…

33244fc

…e for easier reuse across other 3D conv solvers

Cleaned up unused include

25b25a8

removed superfluous includes

279b87a

amd-bartgips marked this pull request as ready for review July 29, 2025 12:23

amd-bartgips requested review from BradPepersAMD, BrianHarrisonAMD, JonathanLichtnerAMD and adickin-amd as code owners July 29, 2025 12:23

amd-bartgips merged commit 3a97cec into miopenff/3d-heuristic Jul 29, 2025
4 of 37 checks passed

amd-bartgips deleted the bartgips/3d-ktn branch July 29, 2025 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3D conv heuristics (KTN part)#3918

3D conv heuristics (KTN part)#3918
amd-bartgips merged 21 commits into
miopenff/3d-heuristicfrom
bartgips/3d-ktn

amd-bartgips commented Jul 28, 2025

Uh oh!

amd-bartgips commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amd-bartgips commented Jul 28, 2025

Proposed changes

Checklist

Uh oh!

amd-bartgips commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant