[miopen] Enhance 3D convolution performance in immediate mode#877
Conversation
…evelop/ROCm_MIOpen/jzhou_3dconv
|
[AMD Official Use Only - AMD Internal Distribution Only]
Currently, there are no requirements from other directions, so we haven't considered it yet, but there likely will be in the future.
Thinks,
Jin
________________________________
From: BrianHarrisonAMD ***@***.***>
Sent: Tuesday, July 29, 2025 10:18:44 PM
To: ROCm/rocm-libraries ***@***.***>
Cc: Zhou, Jin ***@***.***>; Mention ***@***.***>
Subject: Re: [ROCm/rocm-libraries] [miopen] Enhance 3D convolution performance in immediate mode (PR #877)
@BrianHarrisonAMD commented on this pull request.
________________________________
In projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp<#877 (comment)>:
@@ -360,8 +361,56 @@ void PerformanceConfigHipImplicitGemm3DGroupFwdXdlops::Init(const ProblemDescrip
FillValidKernelsIDs<DeviceOpGFwdDefaultPtrs<DataType>, CKArgs<DataType>>(problem);
break;
}
- index = 0;
+ index = 0;
+
+ // for BF16 and FP16
+ index = env::value(CK_CONV3D_IDX);
This env should be updated to follow the other naming conventions for environment variables.
Something like MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_FWD_XDLOPS_IDX_OVERRIDE
Or at least make it specific to FWD direction since I assume we would want some similar behavior for the other directions eventually?
—
Reply to this email directly, view it on GitHub<#877 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2QMHLTVYGYD5FTEWP5LVD3K57EJAVCNFSM6AAAAACCR5MRAGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTANRXG44TSNRVG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
[AMD Official Use Only - AMD Internal Distribution Only]
yes, right, if not, it will break search all mode
Thinks,
Jin
________________________________
From: BrianHarrisonAMD ***@***.***>
Sent: Tuesday, July 29, 2025 10:30:49 PM
To: ROCm/rocm-libraries ***@***.***>
Cc: Zhou, Jin ***@***.***>; Mention ***@***.***>
Subject: Re: [ROCm/rocm-libraries] [miopen] Enhance 3D convolution performance in immediate mode (PR #877)
@BrianHarrisonAMD commented on this pull request.
________________________________
In projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp<#877 (comment)>:
HeuristicInit(problem);
assert(!valid_kernels.empty());
+ if(index != 0)
+ {
+ index = 0;
+ kernel_id = valid_kernels[index];
+ }
Okay so the intent here is that we are skipping the heuristic selection when doing tuning?
—
Reply to this email directly, view it on GitHub<#877 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE2QMHLBHYJ3EZAGBGCU2JT3K6ARTAVCNFSM6AAAAACCR5MRAGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTANRXHA3TKOBWGE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
BrianHarrisonAMD
left a comment
There was a problem hiding this comment.
Thanks for the updates!
Looks good to me.
|
LGTM assuming all CI passes. Also, do we need to test on MI300 CI? (This is not run by default but needs to be triggered manually.) |
[miopen] Enhance 3D convolution performance in immediate mode (#877) Enhance 3D convolution performance in immediate mode (required for dynamic shapes to avoid kernel search overhead). 1. Smarter kernel selection: Optimizes fp16/bf16 performance by choosing better CK kernels contextually, rather than using the first kernel by default. 2. Runtime control: Adds the MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_FWD_XDLOPS_IDX_OVERRIDE env variable for explicit kernel selection.
## Proposed changes This PR contains several MIOpen commits that need to be cherry-picked from the MIOpen develop branch to release/rocm-rel-7.0 for 7.0 RC3. These include: - #877 ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [x] I have added automated tests relevant to the introduced functionality - [x] I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR - [x] I have ran the tests, and they are all passing locally - [ ] I have added relevant documentation for the changes - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [x] I have ran `make format` & `make check_format` to ensure the changes have been formatted
| float ConvHipImplicitGemm3DGroupFwdXdlops::GetWti( | ||
| const ExecutionContext&, const miopen::conv::ProblemDescription& problem) const | ||
| { | ||
| decltype(auto) xDesc = problem.GetIn(); | ||
| decltype(auto) wDesc = problem.GetWeights(); |
There was a problem hiding this comment.
I'm observing build failures on Windows pointing to this code as part of updating the rocm-libraries version used/tested in TheRock at ROCm/TheRock#1195.
On Windows we build with CK disabled (-DMIOPEN_USE_COMPOSABLEKERNEL=OFF) as well as some other possibly relevant options: https://github.com/ROCm/TheRock/blob/9b873dd83d594a6abaa8db3890c09ab445651df0/ml-libs/CMakeLists.txt#L102-L115
Error logs:
[build] [1/5] Building sub-project MIOpen (in background)
[build] FAILED: ml-libs/MIOpen/stamp/build.stamp D:/projects/TheRock/build/ml-libs/MIOpen/stamp/build.stamp
[build] C:\Windows\system32\cmd.exe /C "cd /D D:\projects\TheRock\build\ml-libs\MIOpen\build && d:\projects\TheRock\.venv\Scripts\python D:/projects/TheRock/build_tools/teatime.py --log-timestamps --label MIOpen --interactive D:/projects/TheRock/build/logs/MIOpen_build.log -- "C:/Program Files/CMake/bin/cmake.exe" -E env --unset=ROCM_PATH --unset=ROCM_DIR --unset=HIP_PATH --unset=HIP_DIR -- "C:/Program Files/CMake/bin/cmake.exe" --build D:/projects/TheRock/build/ml-libs/MIOpen/build && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/projects/TheRock/build/ml-libs/MIOpen/stamp/build.stamp"
[build] [MIOpen] [0/2] Re-checking globbed directories...
[build] [MIOpen] [1/2] Linking CXX executable bin\miopen_gtest.exe
[build] [MIOpen] FAILED: bin/miopen_gtest.exe
[build] [MIOpen] C:\Windows\system32\cmd.exe /C "cd . && D:\projects\TheRock\build\core\clr\dist\lib\llvm\bin\clang++.exe -nostartfiles -nostdlib -DWIN32 -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -DNOMINMAX -fms-extensions -fms-compatibility -D_ENABLE_EXTENDED_ALIGNED_STORAGE -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class -Wno-ignored-attributes -Wno-unknown-attributes -Wno-duplicate-decl-specifier --hip-path=D:/projects/TheRock/build/core/clr/dist --hip-device-lib-path=D:/projects/TheRock/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -L D:/projects/TheRock/build/third-party/sysdeps/windows/zlib/build/stage/lib/rocm_sysdeps/lib -L D:/projects/TheRock/build/third-party/sysdeps/windows/zstd/build/stage/lib/rocm_sysdeps/lib -L D:/projects/TheRock/build/compiler/amd-llvm/stage/lib/llvm/lib -L D:/projects/TheRock/build/core/clr/stage/lib -L D:/projects/TheRock/build/third-party/host-blas/stage/lib/host-math/lib -L D:/projects/TheRock/build/third-party/sysdeps/windows/bzip2/build/stage/lib/rocm_sysdeps/lib -L D:/projects/TheRock/build/third-party/sysdeps/windows/sqlite3/build/stage/lib/rocm_sysdeps/lib -Xlinker /subsystem:console -fuse-ld=lld-link @CMakeFiles\miopen_gtest.rsp -o bin\miopen_gtest.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:lib\miopen_gtest.lib -Xlinker /pdb:bin\miopen_gtest.pdb -Xlinker /version:0.0 && cd ."
[build] [MIOpen] lld-link: error: undefined symbol: public: virtual float __cdecl miopen::solver::conv::ConvHipImplicitGemm3DGroupFwdXdlops::GetWti(struct miopen::ExecutionContext const &, struct miopen::conv::ProblemDescription const &) const
[build] [MIOpen] >>> referenced by test/gtest/CMakeFiles/miopen_gtest.dir/group_conv3d_fwd.cpp.obj:(const miopen::solver::conv::ConvHipImplicitGemm3DGroupFwdXdlops::`vftable')
[build] [MIOpen] >>> referenced by test/gtest/CMakeFiles/miopen_gtest.dir/nonpack_conv3d_fwd.cpp.obj
[build] [MIOpen] clang++: error: linker command failed with exit code 1 (use -v to see invocation)
[build] [MIOpen] ninja: build stopped: subcommand failed.
There was a problem hiding this comment.
Oh since it got moved to a CPP it needs the symbol exported now.
There was a problem hiding this comment.
Ill make a quick PR.
There was a problem hiding this comment.
Nice, thank you.
I'm testing this now:
diff --git a/projects/miopen/src/include/miopen/conv/solvers.hpp b/projects/miopen/src/include/miopen/conv/solvers.hpp
index 513b5552f2..f7b66514fd 100644
--- a/projects/miopen/src/include/miopen/conv/solvers.hpp
+++ b/projects/miopen/src/include/miopen/conv/solvers.hpp
@@ -4556,7 +4556,7 @@ struct ConvHipImplicitGemm3DGroupFwdXdlops final
GetSolution(const ExecutionContext&,
const miopen::conv::ProblemDescription&,
const PerformanceConfigHipImplicitGemm3DGroupFwdXdlops&) const override;
- float GetWti(const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;
+ MIOPEN_INTERNALS_EXPORT float GetWti(const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;
MIOPEN_INTERNALS_EXPORT size_t GetWorkspaceSize(
const ExecutionContext&, const miopen::conv::ProblemDescription&) const override;There was a problem hiding this comment.
Success:
165.2 [200/201] Linking CXX executable bin\miopen_gtest.exe
END 1754599261.0221655 165.21049785614014
Proposed changes
Enhance 3D convolution performance in immediate mode (required for dynamic shapes to avoid kernel search overhead).
Smarter kernel selection: Optimizes fp16/bf16 performance by choosing better CK kernels contextually, rather than using the first kernel by default.
Runtime control: Adds the MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_FWD_XDLOPS_IDX_OVERRIDE env variable for explicit kernel selection.
Checklist
Please put an into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.
🔁 Imported from ROCm/MIOpen#3919
🧑💻 Originally authored by @jfactory07