Skip to content

[MIOpen][GraphAPI] Dead code removal: GraphAPI#5603

Merged
vasily-vm merged 3 commits into
ROCm:developfrom
COrruDXC:users/COrruDXC/dead_code_removal_graphapi
Mar 26, 2026
Merged

[MIOpen][GraphAPI] Dead code removal: GraphAPI#5603
vasily-vm merged 3 commits into
ROCm:developfrom
COrruDXC:users/COrruDXC/dead_code_removal_graphapi

Conversation

@COrruDXC
Copy link
Copy Markdown
Contributor

Motivation

This PR removes the GraphAPI experimental feature and related code.

Technical Details

The GraphAPI related source code and gtests have been completely removed.

Test Plan

Verify that MIOpen builds successfully after GraphAPI removal.

Test Result

MIOpen builds successfully after GraphAPI removal.

Submission Checklist

@COrruDXC COrruDXC requested review from a team as code owners March 19, 2026 09:26
@COrruDXC COrruDXC force-pushed the users/COrruDXC/dead_code_removal_graphapi branch from 6b22a6b to e36e1f2 Compare March 19, 2026 09:32
@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label Mar 19, 2026
@COrruDXC COrruDXC changed the title [GraphAPI] Dead code removal: GraphAPI [MIOpen][GraphAPI] Dead code removal: GraphAPI Mar 19, 2026
@COrruDXC COrruDXC marked this pull request as draft March 19, 2026 16:01
@COrruDXC COrruDXC force-pushed the users/COrruDXC/dead_code_removal_graphapi branch 5 times, most recently from dd8b6a7 to 4a4a070 Compare March 24, 2026 15:04
@COrruDXC COrruDXC force-pushed the users/COrruDXC/dead_code_removal_graphapi branch from 4a4a070 to 02bb9d3 Compare March 25, 2026 07:22
@COrruDXC COrruDXC marked this pull request as ready for review March 25, 2026 15:40
@Vsevolod1983 Vsevolod1983 self-requested a review March 26, 2026 12:31
Copy link
Copy Markdown
Contributor

@Vsevolod1983 Vsevolod1983 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vasily-vm vasily-vm merged commit 7c4b86c into ROCm:develop Mar 26, 2026
55 of 57 checks passed
NolanHannaAMD added a commit that referenced this pull request May 8, 2026
…7008)

## Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of
the tests were blacklisted and others were not. This change looks to fix
all of the low hanging fruit, which are the majority of the leaks found.
This includes all of the critical leaks (>100MB) that were reported.
Some other leaks were identified as needing a larger refactor to
resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit
virtual memory area limit errors. These changes are to enable running
them with ASan, but only with a subset of the tests in order to not hit
limits. The test coverage being lost is fairly negligible and the full
tests are still run without ASan enabled.

## Technical Details

Here is a summary of the files looked at and the changes made:

| File | Status | Notes |
|------|--------|-------|
| `test/gtest/gpu_mha_forward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA forward
solution descriptor leaks (report #11, #13). |
| `test/gtest/gpu_mha_backward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA backward
solution descriptor leaks (report #9, #12). |
| `test/gtest/mha_find20.cpp` | Fixed | Added `miopenDestroySolution()`
loop in both `MhaForward` and `MhaBackward` tests. Fixes MHA Find2.0
solution leak (report #16). |
| `test/gtest/gtest_desc_guard.hpp` | Fixed | New shared header
introduced by the refactor. Provides a single `DescGuard<DescType,
CreateFn, DestroyFn>` template (with `TensorDescGuard`, `ConvDescGuard`,
`DropoutDescGuard`, `RNNDescGuard` aliases), a `HandleGuard` RAII
wrapper for `miopenHandle_t`, and the
`DestroyInternalRnnDropoutDesc(rnnDesc)` helper used by every
RNN/LSTM/GRU test to free the internal `DropoutDescriptor` that
`miopenCreateRNNDescriptor` allocates and `miopenSetRNNDescriptor*` then
leaks. Replaces the per-file ad-hoc guard structs from the initial
implementation. |
| `test/gtest/w_supertensor.cpp` | Fixed | Switched raw descriptors to
the shared `RNNDescGuard` / `TensorDescGuard` from
`gtest_desc_guard.hpp`. Added a class-local `DestroyDropoutDesc()`
(called from `TearDown` and before `miopenSetRNNDescriptor`) to prevent
the `miopenSetRNNDescriptor` overwrite leak. Reduced test parameter
space under ASan to avoid OOM. Removed unused `seqLen` parameter and
dead `param_dev_out`/`bias_dev_out` allocations. |
| `test/gtest/lstm.hpp` | Fixed | Switched `rnnDesc` → `RNNDescGuard`,
`DropoutDesc` → `DropoutDescGuard`, and `mio_handle` → `HandleGuard`
(which now owns the `miopenDestroy` call), all from the shared
`gtest_desc_guard.hpp`. Hoisted `dropout_state_buf` so it can be
`hipFree`d at the end of the dropout path. Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (in the non-dropout path only) at end of
`Run`, which frees the internal `DropoutDescriptor` that the Set call
would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB
non-dropout descriptor leaks (report #3, #4, #6, #17-#22). |
| `test/gtest/gru_test.cpp` | Fixed | In the GRU test class: switched
`rnnDesc` → `RNNDescGuard`, `DropoutDesc` → `DropoutDescGuard`,
`mio_handle` → `HandleGuard`, and added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of `Run`
plus `hipFree(dropout_state_buf)` for the dropout path. In the in-file
`GRUFwdCPUVerify` / `GRUBwdDataCPUVerify` helpers: converted the raw
`dropout_inputTensor` / `dropout_outputTensor` declarations to
`TensorDescGuard` (mirroring the `cpu_rnn.hpp` change for the LSTM/RNN
helpers). |
| `test/gtest/softmax_find20.cpp` | Fixed | Changed `Finalize()` to take
the `std::vector<miopenSolution_t>&` and destroy each solution via
`miopenDestroySolution()` before destroying the problem. Updated all 6
`TEST(...)` callers to pass the solutions vector. Fixes Find2.0 softmax
solution/kernel leaks (report #25-#27). |
| `test/gtest/rnn_seq_api.hpp` | Fixed | Hoisted `dropout_state_buf` so
the dropout path can `hipFree` it at the end. Added
`DestroyInternalRnnDropoutDesc(&rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run
to free the internal `DropoutDescriptor` allocations leaked by
`RNNDescriptor` copy-assignment. Same pattern as the LSTM/GRU fixes
(report #29-#30). |
| `test/cpu_rnn.hpp` | Fixed | Converted 6 raw `miopenTensorDescriptor_t
dropout_input/outputTensor` declarations across the LSTM/RNN CPU
verification helpers (`LSTMFwdCPUVerify`, `LSTMBwdDataCPUVerify`,
`RNNFwdTrainCPUVerify`, `RNNBwdDataCPUVerify`, `GRUFwdCPUVerify`,
`GRUBwdDataCPUVerify`) to the shared `TensorDescGuard` from
`gtest_desc_guard.hpp`. Removed the redundant
`miopenCreateTensorDescriptor` calls and updated 12 `miopen::deref(...)`
sites to `.get()`. (Note: the GRU helpers in this header are stale
duplicates; the live ones are inside `test/gtest/gru_test.cpp` and were
updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor
leaks (report #1, #2, #14, #15, #32, #33). |
| `test/gtest/rnn_vanilla_common.hpp` | Fixed | Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` calls before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run.
The `RNNDescGuard` / `DropoutDescGuard` usage was already in place from
an earlier commit and now resolves to the shared definitions in
`gtest_desc_guard.hpp`. Same pattern as LSTM/GRU/rnn_seq_api fixes
(report #14, #15, #32, #33). |
| `test/gtest/graphapi_gtest_common.hpp` | Skipped | File no longer
exists in the codebase. The GraphApi test infrastructure has been
removed; only the leak report (a stale snapshot) still references it. No
fix possible against the current source tree. |
| `test/gtest/graphapi_execution_plan.cpp` | Skipped | File no longer
exists in the codebase (GraphApi removed). The leak it represented was
largely an external hipblaslt bug anyway; the remaining test-side
portion is not fixable against the current source. |
| `test/gtest/na_train.cpp` / `na_inference.cpp` / `na_*_find2.cpp` |
Skipped | Leaks come from the internal MIOpen solver/kernel cache living
on the global singleton handle, which is never destroyed. Not easily
fixable without redesigning global handle lifecycle. Should be
suppressed in the ASan suppression file. |
| hipblaslt / rocblaslt (external) | Skipped |
`SolutionCache::addKernel` and `preloadCustomKernels` leak via
`_rocblaslt_handle` constructor. Called from
`miopen::Handle::CreateHipblasLtHandle`. This is an upstream bug in
hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that
initialize a handle (report Category 2, Category 7). |
| CLR / HIP runtime (external) | Skipped | `amd::Context` and
`amd::roc::Device` global initialization leaks from
`rocclr/platform/context.cpp`. HIP runtime internals, not fixable in
MIOpen. |
| `src/hipoc/hipoc_program.cpp` | Skipped | `HIPOCProgramImpl` objects
leak during kernel compilation/caching (line ~178). This is an internal
MIOpen kernel cache lifecycle issue that requires deeper architectural
changes to fix. Contributes small amounts to MHA and Softmax Find2.0
leaks. |
| `test/gtest/conv_api.cpp` | Fixed | Already clean against current
source — `miopenDestroyConvolutionDescriptor(conv_desc)` call exists
(line 24) inside the test loop. ASAN run reports no leaks. The leak
report was based on a stale snapshot. The hipblaslt handle init portion
is tracked under the external-skipped row. |
| `test/gtest/log_test.cpp` (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) |
Fixed | Already clean against current source — `Tensor`, `Conv`,
`CreateCBAFusionPlan`, `CreateBNormFusionPlan` all have proper
destructors that call the corresponding `miopenDestroy*` APIs in
`log.cpp`. ASAN run on `CPU_LOG_TEST_*` (11 tests across log_test.cpp +
log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is
tracked separately under the external-skipped row. |
| `test/gtest/fusion_test.cpp` (CPU_FusionCreateOpConvForward) | Fixed |
File renamed from `fusion.cpp` to `fusion_test.cpp`. Already clean
against current source — uses `TensorDescGuard`/`ConvDescGuard` for
tensor/conv descriptors and calls
`miopenDestroyFusionPlan(fusionPlanDesc)` on the fusion plan (line 195).
ASAN run on `CPU_FusionCreateOpConvForward_FP32.*` reports no leaks. |
| `test/gtest/deterministic_conv_api.cpp` | Fixed | Already clean
against current source — uses `ConvDescGuard` (line 66) for the conv
descriptor. ASAN run on `*CPU_DeterministicConvApi*` reports no leaks. |
| `test/gtest/fusion_aux.cpp` (GPU_FusionAux) | Fixed | Already clean
against current source — uses `ConvDescGuard` plus stack-allocated
internal C++ objects (`miopen::TensorDescriptor`,
`miopen::FusionPlanDescriptor`) which have proper destructors. The
`convoOp` handle is owned by the fusion plan. ASAN run on
`*GPU_FusionAux*` reports no leaks. |
| `test/gtest/backend_api.cpp` (CPU_BackendApi) | Skipped | File no
longer exists in the codebase. The backend API test infrastructure (part
of the removed GraphApi suite) was removed; no fix possible against the
current source tree. |


### High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)
- DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template
parameterized on the descriptor type and its create/destroy entry
points. Aliases provide TensorDescGuard, ConvDescGuard,
DropoutDescGuard, and RNNDescGuard, replacing the four near-identical
guard structs
  that were copy-pasted across test files in the initial implementation.
- HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse
the template because miopenCreateWithStream takes an extra hipStream_t
argument). Supports lazy create(stream) so callers that only need a
handle in the dropout branch can default-construct one and
  populate it conditionally.
- DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal
DropoutDescriptor that miopenCreateRNNDescriptor allocates and that
miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline
blocks that LSTM/GRU/RNN tests were each carrying. The header documents
the
two call-sites: before each Set* (always safe) and at end-of-run only on
the non-dropout path (the dropout path aliases the user-owned
descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor
- The "leak from Set* overwriting the default-constructed internal
dropout descriptor" fix collapsed from per-file code to a one-line
helper call, applied uniformly across lstm.hpp, gru_test.cpp,
rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
- mio_handle ownership in LSTM/GRU is now expressed via HandleGuard
rather than a manual miopenDestroy at the end of the dropout branch —
eliminates a class of forgotten-cleanup bugs.
- dropout_state_buf is consistently hoisted out of the dropout if block
so an end-of-run hipFree can release it; deletion of the buffer pairs
visibly with its allocation.

Notable non-RNN change
- softmax_find20.cpp was the only Find2.0 leak fix in this commit:
Finalize() now takes the solutions vector and calls
miopenDestroySolution() for each before destroying the problem. Same
shape applied to all 6 tests in the file.


## Test Plan

Run the tests beforehand to observe the ASan leak errors and then again
afterward to verify the fixes have resolved the problem.

## Test Result

List from ROCM-21512:

| # | Test Name | Status | Leak Status |

|---:|------------------------------------------------------|------------------------------------------------|-----------------|
| 1 | Smoke/GPU_RNNVanillaDropout_FP32 | PASSED (4 tests) | No leaks |
| 2 | Smoke/GPU_RNNVanillaDropout_FP16 | PASSED (4 tests) | No leaks |
| 3 | Full/GPU_LSTM_dropout_FP32 | PASSED (4 tests) | No leaks |
| 4 | Full/GPU_LSTM_dropout_FP16 | PASSED (4 tests) | No leaks |
| 5 | CPU_GraphApiExecutionPlanBuilder_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 6 | Full/GPU_LSTM_dropout_FP64 | REMOVED (PR #5750, 2026-03-26) | n/a
(deleted) |
| 7 | Unit/CPU_GraphApiPointwise_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 8 | Full/GPU_LstmMSRnn_FP32 | PASSED (1152 tests) | No leaks |
| 9 | Smoke/GPU_Bwd_Mha_FP32 | PASSED (12 tests) | No leaks |
| 10 | Full/GPU_LstmMSRnn_FP16 | PASSED (864 tests) | No leaks |
| 11 | Smoke/GPU_Fwd_Mha_FP32 | PASSED (15 tests) | No leaks |
| 12 | Full/GPU_Bwd_Mha_FP32 | PASSED (6 tests) | No leaks |
| 13 | Full/GPU_Fwd_Mha_FP32 | PASSED (7 tests) | No leaks |
| 14 | Full/GPU_RNNVanilla_FP32 | PASSED (96 tests) | No leaks |
| 15 | Full/GPU_RNNVanilla_FP16 | PASSED (96 tests) | No leaks |
| 16 | GPU_TestMhaFind20_FP32 | PASSED (2 tests) | No leaks |
| 17 | Full/GPU_LSTM_FP32 | PASSED (32 tests) | No leaks |
| 18 | Full/GPU_LSTM_FP16 | PASSED (32 tests) | No leaks |
| 19 | Full/GPU_LSTM_extra_FP32 | PASSED (30 tests) | No leaks |
| 20 | Full/GPU_LSTM_extra_FP16 | PASSED (30 tests) | No leaks |
| 21 | Full/GPU_DeepBench_LSTM_FP16 | PASSED (22 tests) | No leaks |
| 22 | Full/GPU_DeepBench_LSTM_FP32 | PASSED (22 tests) | No leaks |
| 23 | CPU_LOG_TEST_FUSION_NONE | PASSED (2 tests) | No leaks |
| 24 | CPU_LOG_TEST_NEG_NONE | PASSED (4 tests) | No leaks |
| 25 | GPU_SoftmaxFind20_BFP16 | PASSED (2 tests) | No leaks |
| 26 | GPU_SoftmaxFind20_FP16 | PASSED (2 tests) | No leaks |
| 27 | GPU_SoftmaxFind20_FP32 | PASSED (2 tests) | No leaks |
| 28 | CPU_ConvApi_NONE | PASSED (1 test) | No leaks |
| 29 | Full/GPU_RNNSeqApi_FP16 | PASSED (16 tests) | No leaks |
| 30 | Full/GPU_RNNSeqApi_FP32 | PASSED (16 tests) | No leaks |
| 31 | UnitVAN/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 32 | Smoke/GPU_RNNVanilla_FP16 | PASSED (4 tests) | No leaks |
| 33 | Smoke/GPU_RNNVanilla_FP32 | PASSED (4 tests) | No leaks |
| 34 | CPU_FusionCreateOpConvForward_FP32 | PASSED (1 test) | No leaks |
| 35 | CPU_GraphApiOperationReduction_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 36 | Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 37 | Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 38 | Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 39 | Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 40 | UnitVAB/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 41 | Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) | (see #2) |
(see #2) |
| 42 | CPU_GraphApiOperationGraphDescriptor_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 43 | UnitVA/CPU_GraphApiVariantPack_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 44 | UnitVAU/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 45 | CPU_GraphApiOperationReshape_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 46 | Smoke/CPU_DeterministicConvApi_NONE | PASSED (1 test) | No leaks
|
| 47 | Smoke/GPU_FusionAux_FP32 | PASSED (1 test) | No leaks |
| 48 | CPU_GraphApiEngineHeur_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 49 | Unit/CPU_GraphApiReduction_NONE | REMOVED (PR #5603, 2026-03-26)
| n/a (deleted) |
| 50 | CPU_GraphApiEngineCfg_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 51 | Unit/CPU_GraphApiMatMul_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 52 | CPU_BackendApi_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 53 | UnitIV/CPU_GraphApiOperationPointwiseOneInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 54 | Unit3IV/CPU_GraphApiOperationPointwiseThreeInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 55 | UnitVA/CPU_GraphApiOperationMatmul_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 56 | UnitVA/CPU_GraphApiOperationRng_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |


| Outcome                            | Count |
|------------------------------------|------:|
| Passed, no leaks                   |    39 |
| Passed, leaks detected             |     0 |
| Failed                             |     0 |
| Crashed / timed out                |     0 |
| Removed — GraphAPI purge (#5603)   |    15 |
| Removed — FP64 LSTM purge (#5750)  |     1 |
| Duplicate (not re-run)             |     1 |
| **Total rows**                     |  **56** |

## Risk Assessment
Low

---------

Co-authored-by: JonathanLichtnerAMD <195780826+JonathanLichtnerAMD@users.noreply.github.com>
aledudek pushed a commit that referenced this pull request May 20, 2026
…7008)

## Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of
the tests were blacklisted and others were not. This change looks to fix
all of the low hanging fruit, which are the majority of the leaks found.
This includes all of the critical leaks (>100MB) that were reported.
Some other leaks were identified as needing a larger refactor to
resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit
virtual memory area limit errors. These changes are to enable running
them with ASan, but only with a subset of the tests in order to not hit
limits. The test coverage being lost is fairly negligible and the full
tests are still run without ASan enabled.

## Technical Details

Here is a summary of the files looked at and the changes made:

| File | Status | Notes |
|------|--------|-------|
| `test/gtest/gpu_mha_forward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA forward
solution descriptor leaks (report #11, #13). |
| `test/gtest/gpu_mha_backward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA backward
solution descriptor leaks (report #9, #12). |
| `test/gtest/mha_find20.cpp` | Fixed | Added `miopenDestroySolution()`
loop in both `MhaForward` and `MhaBackward` tests. Fixes MHA Find2.0
solution leak (report #16). |
| `test/gtest/gtest_desc_guard.hpp` | Fixed | New shared header
introduced by the refactor. Provides a single `DescGuard<DescType,
CreateFn, DestroyFn>` template (with `TensorDescGuard`, `ConvDescGuard`,
`DropoutDescGuard`, `RNNDescGuard` aliases), a `HandleGuard` RAII
wrapper for `miopenHandle_t`, and the
`DestroyInternalRnnDropoutDesc(rnnDesc)` helper used by every
RNN/LSTM/GRU test to free the internal `DropoutDescriptor` that
`miopenCreateRNNDescriptor` allocates and `miopenSetRNNDescriptor*` then
leaks. Replaces the per-file ad-hoc guard structs from the initial
implementation. |
| `test/gtest/w_supertensor.cpp` | Fixed | Switched raw descriptors to
the shared `RNNDescGuard` / `TensorDescGuard` from
`gtest_desc_guard.hpp`. Added a class-local `DestroyDropoutDesc()`
(called from `TearDown` and before `miopenSetRNNDescriptor`) to prevent
the `miopenSetRNNDescriptor` overwrite leak. Reduced test parameter
space under ASan to avoid OOM. Removed unused `seqLen` parameter and
dead `param_dev_out`/`bias_dev_out` allocations. |
| `test/gtest/lstm.hpp` | Fixed | Switched `rnnDesc` → `RNNDescGuard`,
`DropoutDesc` → `DropoutDescGuard`, and `mio_handle` → `HandleGuard`
(which now owns the `miopenDestroy` call), all from the shared
`gtest_desc_guard.hpp`. Hoisted `dropout_state_buf` so it can be
`hipFree`d at the end of the dropout path. Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (in the non-dropout path only) at end of
`Run`, which frees the internal `DropoutDescriptor` that the Set call
would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB
non-dropout descriptor leaks (report #3, #4, #6, #17-#22). |
| `test/gtest/gru_test.cpp` | Fixed | In the GRU test class: switched
`rnnDesc` → `RNNDescGuard`, `DropoutDesc` → `DropoutDescGuard`,
`mio_handle` → `HandleGuard`, and added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of `Run`
plus `hipFree(dropout_state_buf)` for the dropout path. In the in-file
`GRUFwdCPUVerify` / `GRUBwdDataCPUVerify` helpers: converted the raw
`dropout_inputTensor` / `dropout_outputTensor` declarations to
`TensorDescGuard` (mirroring the `cpu_rnn.hpp` change for the LSTM/RNN
helpers). |
| `test/gtest/softmax_find20.cpp` | Fixed | Changed `Finalize()` to take
the `std::vector<miopenSolution_t>&` and destroy each solution via
`miopenDestroySolution()` before destroying the problem. Updated all 6
`TEST(...)` callers to pass the solutions vector. Fixes Find2.0 softmax
solution/kernel leaks (report #25-#27). |
| `test/gtest/rnn_seq_api.hpp` | Fixed | Hoisted `dropout_state_buf` so
the dropout path can `hipFree` it at the end. Added
`DestroyInternalRnnDropoutDesc(&rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run
to free the internal `DropoutDescriptor` allocations leaked by
`RNNDescriptor` copy-assignment. Same pattern as the LSTM/GRU fixes
(report #29-#30). |
| `test/cpu_rnn.hpp` | Fixed | Converted 6 raw `miopenTensorDescriptor_t
dropout_input/outputTensor` declarations across the LSTM/RNN CPU
verification helpers (`LSTMFwdCPUVerify`, `LSTMBwdDataCPUVerify`,
`RNNFwdTrainCPUVerify`, `RNNBwdDataCPUVerify`, `GRUFwdCPUVerify`,
`GRUBwdDataCPUVerify`) to the shared `TensorDescGuard` from
`gtest_desc_guard.hpp`. Removed the redundant
`miopenCreateTensorDescriptor` calls and updated 12 `miopen::deref(...)`
sites to `.get()`. (Note: the GRU helpers in this header are stale
duplicates; the live ones are inside `test/gtest/gru_test.cpp` and were
updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor
leaks (report #1, #2, #14, #15, #32, #33). |
| `test/gtest/rnn_vanilla_common.hpp` | Fixed | Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` calls before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run.
The `RNNDescGuard` / `DropoutDescGuard` usage was already in place from
an earlier commit and now resolves to the shared definitions in
`gtest_desc_guard.hpp`. Same pattern as LSTM/GRU/rnn_seq_api fixes
(report #14, #15, #32, #33). |
| `test/gtest/graphapi_gtest_common.hpp` | Skipped | File no longer
exists in the codebase. The GraphApi test infrastructure has been
removed; only the leak report (a stale snapshot) still references it. No
fix possible against the current source tree. |
| `test/gtest/graphapi_execution_plan.cpp` | Skipped | File no longer
exists in the codebase (GraphApi removed). The leak it represented was
largely an external hipblaslt bug anyway; the remaining test-side
portion is not fixable against the current source. |
| `test/gtest/na_train.cpp` / `na_inference.cpp` / `na_*_find2.cpp` |
Skipped | Leaks come from the internal MIOpen solver/kernel cache living
on the global singleton handle, which is never destroyed. Not easily
fixable without redesigning global handle lifecycle. Should be
suppressed in the ASan suppression file. |
| hipblaslt / rocblaslt (external) | Skipped |
`SolutionCache::addKernel` and `preloadCustomKernels` leak via
`_rocblaslt_handle` constructor. Called from
`miopen::Handle::CreateHipblasLtHandle`. This is an upstream bug in
hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that
initialize a handle (report Category 2, Category 7). |
| CLR / HIP runtime (external) | Skipped | `amd::Context` and
`amd::roc::Device` global initialization leaks from
`rocclr/platform/context.cpp`. HIP runtime internals, not fixable in
MIOpen. |
| `src/hipoc/hipoc_program.cpp` | Skipped | `HIPOCProgramImpl` objects
leak during kernel compilation/caching (line ~178). This is an internal
MIOpen kernel cache lifecycle issue that requires deeper architectural
changes to fix. Contributes small amounts to MHA and Softmax Find2.0
leaks. |
| `test/gtest/conv_api.cpp` | Fixed | Already clean against current
source — `miopenDestroyConvolutionDescriptor(conv_desc)` call exists
(line 24) inside the test loop. ASAN run reports no leaks. The leak
report was based on a stale snapshot. The hipblaslt handle init portion
is tracked under the external-skipped row. |
| `test/gtest/log_test.cpp` (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) |
Fixed | Already clean against current source — `Tensor`, `Conv`,
`CreateCBAFusionPlan`, `CreateBNormFusionPlan` all have proper
destructors that call the corresponding `miopenDestroy*` APIs in
`log.cpp`. ASAN run on `CPU_LOG_TEST_*` (11 tests across log_test.cpp +
log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is
tracked separately under the external-skipped row. |
| `test/gtest/fusion_test.cpp` (CPU_FusionCreateOpConvForward) | Fixed |
File renamed from `fusion.cpp` to `fusion_test.cpp`. Already clean
against current source — uses `TensorDescGuard`/`ConvDescGuard` for
tensor/conv descriptors and calls
`miopenDestroyFusionPlan(fusionPlanDesc)` on the fusion plan (line 195).
ASAN run on `CPU_FusionCreateOpConvForward_FP32.*` reports no leaks. |
| `test/gtest/deterministic_conv_api.cpp` | Fixed | Already clean
against current source — uses `ConvDescGuard` (line 66) for the conv
descriptor. ASAN run on `*CPU_DeterministicConvApi*` reports no leaks. |
| `test/gtest/fusion_aux.cpp` (GPU_FusionAux) | Fixed | Already clean
against current source — uses `ConvDescGuard` plus stack-allocated
internal C++ objects (`miopen::TensorDescriptor`,
`miopen::FusionPlanDescriptor`) which have proper destructors. The
`convoOp` handle is owned by the fusion plan. ASAN run on
`*GPU_FusionAux*` reports no leaks. |
| `test/gtest/backend_api.cpp` (CPU_BackendApi) | Skipped | File no
longer exists in the codebase. The backend API test infrastructure (part
of the removed GraphApi suite) was removed; no fix possible against the
current source tree. |


### High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)
- DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template
parameterized on the descriptor type and its create/destroy entry
points. Aliases provide TensorDescGuard, ConvDescGuard,
DropoutDescGuard, and RNNDescGuard, replacing the four near-identical
guard structs
  that were copy-pasted across test files in the initial implementation.
- HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse
the template because miopenCreateWithStream takes an extra hipStream_t
argument). Supports lazy create(stream) so callers that only need a
handle in the dropout branch can default-construct one and
  populate it conditionally.
- DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal
DropoutDescriptor that miopenCreateRNNDescriptor allocates and that
miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline
blocks that LSTM/GRU/RNN tests were each carrying. The header documents
the
two call-sites: before each Set* (always safe) and at end-of-run only on
the non-dropout path (the dropout path aliases the user-owned
descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor
- The "leak from Set* overwriting the default-constructed internal
dropout descriptor" fix collapsed from per-file code to a one-line
helper call, applied uniformly across lstm.hpp, gru_test.cpp,
rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
- mio_handle ownership in LSTM/GRU is now expressed via HandleGuard
rather than a manual miopenDestroy at the end of the dropout branch —
eliminates a class of forgotten-cleanup bugs.
- dropout_state_buf is consistently hoisted out of the dropout if block
so an end-of-run hipFree can release it; deletion of the buffer pairs
visibly with its allocation.

Notable non-RNN change
- softmax_find20.cpp was the only Find2.0 leak fix in this commit:
Finalize() now takes the solutions vector and calls
miopenDestroySolution() for each before destroying the problem. Same
shape applied to all 6 tests in the file.


## Test Plan

Run the tests beforehand to observe the ASan leak errors and then again
afterward to verify the fixes have resolved the problem.

## Test Result

List from ROCM-21512:

| # | Test Name | Status | Leak Status |

|---:|------------------------------------------------------|------------------------------------------------|-----------------|
| 1 | Smoke/GPU_RNNVanillaDropout_FP32 | PASSED (4 tests) | No leaks |
| 2 | Smoke/GPU_RNNVanillaDropout_FP16 | PASSED (4 tests) | No leaks |
| 3 | Full/GPU_LSTM_dropout_FP32 | PASSED (4 tests) | No leaks |
| 4 | Full/GPU_LSTM_dropout_FP16 | PASSED (4 tests) | No leaks |
| 5 | CPU_GraphApiExecutionPlanBuilder_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 6 | Full/GPU_LSTM_dropout_FP64 | REMOVED (PR #5750, 2026-03-26) | n/a
(deleted) |
| 7 | Unit/CPU_GraphApiPointwise_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 8 | Full/GPU_LstmMSRnn_FP32 | PASSED (1152 tests) | No leaks |
| 9 | Smoke/GPU_Bwd_Mha_FP32 | PASSED (12 tests) | No leaks |
| 10 | Full/GPU_LstmMSRnn_FP16 | PASSED (864 tests) | No leaks |
| 11 | Smoke/GPU_Fwd_Mha_FP32 | PASSED (15 tests) | No leaks |
| 12 | Full/GPU_Bwd_Mha_FP32 | PASSED (6 tests) | No leaks |
| 13 | Full/GPU_Fwd_Mha_FP32 | PASSED (7 tests) | No leaks |
| 14 | Full/GPU_RNNVanilla_FP32 | PASSED (96 tests) | No leaks |
| 15 | Full/GPU_RNNVanilla_FP16 | PASSED (96 tests) | No leaks |
| 16 | GPU_TestMhaFind20_FP32 | PASSED (2 tests) | No leaks |
| 17 | Full/GPU_LSTM_FP32 | PASSED (32 tests) | No leaks |
| 18 | Full/GPU_LSTM_FP16 | PASSED (32 tests) | No leaks |
| 19 | Full/GPU_LSTM_extra_FP32 | PASSED (30 tests) | No leaks |
| 20 | Full/GPU_LSTM_extra_FP16 | PASSED (30 tests) | No leaks |
| 21 | Full/GPU_DeepBench_LSTM_FP16 | PASSED (22 tests) | No leaks |
| 22 | Full/GPU_DeepBench_LSTM_FP32 | PASSED (22 tests) | No leaks |
| 23 | CPU_LOG_TEST_FUSION_NONE | PASSED (2 tests) | No leaks |
| 24 | CPU_LOG_TEST_NEG_NONE | PASSED (4 tests) | No leaks |
| 25 | GPU_SoftmaxFind20_BFP16 | PASSED (2 tests) | No leaks |
| 26 | GPU_SoftmaxFind20_FP16 | PASSED (2 tests) | No leaks |
| 27 | GPU_SoftmaxFind20_FP32 | PASSED (2 tests) | No leaks |
| 28 | CPU_ConvApi_NONE | PASSED (1 test) | No leaks |
| 29 | Full/GPU_RNNSeqApi_FP16 | PASSED (16 tests) | No leaks |
| 30 | Full/GPU_RNNSeqApi_FP32 | PASSED (16 tests) | No leaks |
| 31 | UnitVAN/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 32 | Smoke/GPU_RNNVanilla_FP16 | PASSED (4 tests) | No leaks |
| 33 | Smoke/GPU_RNNVanilla_FP32 | PASSED (4 tests) | No leaks |
| 34 | CPU_FusionCreateOpConvForward_FP32 | PASSED (1 test) | No leaks |
| 35 | CPU_GraphApiOperationReduction_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 36 | Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 37 | Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 38 | Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 39 | Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 40 | UnitVAB/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 41 | Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) | (see #2) |
(see #2) |
| 42 | CPU_GraphApiOperationGraphDescriptor_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 43 | UnitVA/CPU_GraphApiVariantPack_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 44 | UnitVAU/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 45 | CPU_GraphApiOperationReshape_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 46 | Smoke/CPU_DeterministicConvApi_NONE | PASSED (1 test) | No leaks
|
| 47 | Smoke/GPU_FusionAux_FP32 | PASSED (1 test) | No leaks |
| 48 | CPU_GraphApiEngineHeur_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 49 | Unit/CPU_GraphApiReduction_NONE | REMOVED (PR #5603, 2026-03-26)
| n/a (deleted) |
| 50 | CPU_GraphApiEngineCfg_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 51 | Unit/CPU_GraphApiMatMul_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 52 | CPU_BackendApi_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 53 | UnitIV/CPU_GraphApiOperationPointwiseOneInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 54 | Unit3IV/CPU_GraphApiOperationPointwiseThreeInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 55 | UnitVA/CPU_GraphApiOperationMatmul_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 56 | UnitVA/CPU_GraphApiOperationRng_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |


| Outcome                            | Count |
|------------------------------------|------:|
| Passed, no leaks                   |    39 |
| Passed, leaks detected             |     0 |
| Failed                             |     0 |
| Crashed / timed out                |     0 |
| Removed — GraphAPI purge (#5603)   |    15 |
| Removed — FP64 LSTM purge (#5750)  |     1 |
| Duplicate (not re-run)             |     1 |
| **Total rows**                     |  **56** |

## Risk Assessment
Low

---------

Co-authored-by: JonathanLichtnerAMD <195780826+JonathanLichtnerAMD@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation external contribution Code contribution from users community.. project: miopen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants