Skip to content

[miopen] Enhance 3D convolution performance in immediate mode#877

Merged
jfactory07 merged 25 commits into
developfrom
import/develop/ROCm_MIOpen/jzhou_3dconv
Aug 6, 2025
Merged

[miopen] Enhance 3D convolution performance in immediate mode#877
jfactory07 merged 25 commits into
developfrom
import/develop/ROCm_MIOpen/jzhou_3dconv

Conversation

@assistant-librarian
Copy link
Copy Markdown
Contributor

@assistant-librarian assistant-librarian Bot commented Jul 28, 2025

Proposed changes

Enhance 3D convolution performance in immediate mode (required for dynamic shapes to avoid kernel search overhead).

  1. Smarter kernel selection: Optimizes fp16/bf16 performance by choosing better CK kernels contextually, rather than using the first kernel by default.

  2. Runtime control: Adds the MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_FWD_XDLOPS_IDX_OVERRIDE env variable for explicit kernel selection.

Checklist

Please put an into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added automated tests relevant to the introduced functionality
  • I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR
  • I have ran the tests, and they are all passing locally
  • I have added relevant documentation for the changes
  • I have removed the stale documentation which is no longer relevant after this pull request
  • I have ran & to ensure the changes have been formatted

🔁 Imported from ROCm/MIOpen#3919
🧑‍💻 Originally authored by @jfactory07

@assistant-librarian assistant-librarian Bot requested a review from a team as a code owner July 28, 2025 20:06
@jfactory07 jfactory07 changed the title Jzhou/3dconv Enhance 3D convolution performance in immediate mode Jul 29, 2025
@jfactory07 jfactory07 changed the title Enhance 3D convolution performance in immediate mode [miopen] Enhance 3D convolution performance in immediate mode Jul 29, 2025
@jfactory07 jfactory07 mentioned this pull request Jul 29, 2025
6 tasks
Comment thread projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp Outdated
@jfactory07
Copy link
Copy Markdown
Contributor

jfactory07 commented Jul 29, 2025 via email

@jfactory07
Copy link
Copy Markdown
Contributor

jfactory07 commented Jul 29, 2025 via email

Comment thread projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp Outdated
Comment thread projects/miopen/test/gtest/group_conv.hpp Outdated
Copy link
Copy Markdown
Contributor

@BrianHarrisonAMD BrianHarrisonAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

Looks good to me.

@JonathanLichtnerAMD
Copy link
Copy Markdown
Contributor

LGTM assuming all CI passes. Also, do we need to test on MI300 CI? (This is not run by default but needs to be triggered manually.)

@jfactory07 jfactory07 merged commit 95d1e21 into develop Aug 6, 2025
26 of 30 checks passed
@jfactory07 jfactory07 deleted the import/develop/ROCm_MIOpen/jzhou_3dconv branch August 6, 2025 07:00
assistant-librarian Bot pushed a commit to ROCm/MIOpen that referenced this pull request Aug 6, 2025
[miopen] Enhance 3D convolution performance in immediate mode
 (#877)

Enhance 3D convolution performance in immediate mode (required for
dynamic shapes to avoid kernel search overhead).

1. Smarter kernel selection: Optimizes fp16/bf16 performance by choosing
better CK kernels contextually, rather than using the first kernel by
default.

2. Runtime control: Adds the
MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_FWD_XDLOPS_IDX_OVERRIDE env
variable for explicit kernel selection.
vamovsik pushed a commit that referenced this pull request Aug 7, 2025
## Proposed changes

This PR contains several MIOpen commits that need to be cherry-picked
from the MIOpen develop branch to release/rocm-rel-7.0 for 7.0 RC3.
These include:

- #877

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [x] I have added automated tests relevant to the introduced
functionality
- [x] I have sufficient test coverage for the changes, and code coverage
hasn't decreased as a result of my PR
- [x] I have ran the tests, and they are all passing locally
- [ ] I have added relevant documentation for the changes
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [x] I have ran `make format` & `make check_format` to ensure the
changes have been formatted
Comment on lines +611 to +615
float ConvHipImplicitGemm3DGroupFwdXdlops::GetWti(
const ExecutionContext&, const miopen::conv::ProblemDescription& problem) const
{
decltype(auto) xDesc = problem.GetIn();
decltype(auto) wDesc = problem.GetWeights();
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm observing build failures on Windows pointing to this code as part of updating the rocm-libraries version used/tested in TheRock at ROCm/TheRock#1195.

On Windows we build with CK disabled (-DMIOPEN_USE_COMPOSABLEKERNEL=OFF) as well as some other possibly relevant options: https://github.com/ROCm/TheRock/blob/9b873dd83d594a6abaa8db3890c09ab445651df0/ml-libs/CMakeLists.txt#L102-L115

Error logs:

[build] [1/5] Building sub-project MIOpen (in background)
[build] FAILED: ml-libs/MIOpen/stamp/build.stamp D:/projects/TheRock/build/ml-libs/MIOpen/stamp/build.stamp 
[build] C:\Windows\system32\cmd.exe /C "cd /D D:\projects\TheRock\build\ml-libs\MIOpen\build && d:\projects\TheRock\.venv\Scripts\python D:/projects/TheRock/build_tools/teatime.py --log-timestamps --label MIOpen --interactive D:/projects/TheRock/build/logs/MIOpen_build.log -- "C:/Program Files/CMake/bin/cmake.exe" -E env --unset=ROCM_PATH --unset=ROCM_DIR --unset=HIP_PATH --unset=HIP_DIR -- "C:/Program Files/CMake/bin/cmake.exe" --build D:/projects/TheRock/build/ml-libs/MIOpen/build && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/projects/TheRock/build/ml-libs/MIOpen/stamp/build.stamp"
[build] [MIOpen] [0/2] Re-checking globbed directories...
[build] [MIOpen] [1/2] Linking CXX executable bin\miopen_gtest.exe
[build] [MIOpen] FAILED: bin/miopen_gtest.exe 
[build] [MIOpen] C:\Windows\system32\cmd.exe /C "cd . && D:\projects\TheRock\build\core\clr\dist\lib\llvm\bin\clang++.exe -nostartfiles -nostdlib -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE  -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=D:/projects/TheRock/build/core/clr/dist --hip-device-lib-path=D:/projects/TheRock/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -L D:/projects/TheRock/build/third-party/sysdeps/windows/zlib/build/stage/lib/rocm_sysdeps/lib  -L D:/projects/TheRock/build/third-party/sysdeps/windows/zstd/build/stage/lib/rocm_sysdeps/lib  -L D:/projects/TheRock/build/compiler/amd-llvm/stage/lib/llvm/lib  -L D:/projects/TheRock/build/core/clr/stage/lib  -L D:/projects/TheRock/build/third-party/host-blas/stage/lib/host-math/lib  -L D:/projects/TheRock/build/third-party/sysdeps/windows/bzip2/build/stage/lib/rocm_sysdeps/lib  -L D:/projects/TheRock/build/third-party/sysdeps/windows/sqlite3/build/stage/lib/rocm_sysdeps/lib   -Xlinker /subsystem:console   -fuse-ld=lld-link @CMakeFiles\miopen_gtest.rsp -o bin\miopen_gtest.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:lib\miopen_gtest.lib -Xlinker /pdb:bin\miopen_gtest.pdb -Xlinker /version:0.0   && cd ."
[build] [MIOpen] lld-link: error: undefined symbol: public: virtual float __cdecl miopen::solver::conv::ConvHipImplicitGemm3DGroupFwdXdlops::GetWti(struct miopen::ExecutionContext const &, struct miopen::conv::ProblemDescription const &) const
[build] [MIOpen] >>> referenced by test/gtest/CMakeFiles/miopen_gtest.dir/group_conv3d_fwd.cpp.obj:(const miopen::solver::conv::ConvHipImplicitGemm3DGroupFwdXdlops::`vftable')
[build] [MIOpen] >>> referenced by test/gtest/CMakeFiles/miopen_gtest.dir/nonpack_conv3d_fwd.cpp.obj
[build] [MIOpen] clang++: error: linker command failed with exit code 1 (use -v to see invocation)

[build] [MIOpen] ninja: build stopped: subcommand failed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh since it got moved to a CPP it needs the symbol exported now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill make a quick PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you.

I'm testing this now:

diff --git a/projects/miopen/src/include/miopen/conv/solvers.hpp b/projects/miopen/src/include/miopen/conv/solvers.hpp
index 513b5552f2..f7b66514fd 100644
--- a/projects/miopen/src/include/miopen/conv/solvers.hpp
+++ b/projects/miopen/src/include/miopen/conv/solvers.hpp
@@ -4556,7 +4556,7 @@ struct ConvHipImplicitGemm3DGroupFwdXdlops final
     GetSolution(const ExecutionContext&,
                 const miopen::conv::ProblemDescription&,
                 const PerformanceConfigHipImplicitGemm3DGroupFwdXdlops&) const override;
-    float GetWti(const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;
+    MIOPEN_INTERNALS_EXPORT float GetWti(const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;

     MIOPEN_INTERNALS_EXPORT size_t GetWorkspaceSize(
         const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success:

165.2	[200/201] Linking CXX executable bin\miopen_gtest.exe
END	1754599261.0221655	165.21049785614014

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants