Skip to content

[MIOpen] [ROCM-21512] [ALMIOPEN-1814] Fix MIOpen ASan memory leaks#7008

Merged
NolanHannaAMD merged 7 commits into
developfrom
users/nhanna/fix-miopen-asan-memory-leaks
May 8, 2026
Merged

[MIOpen] [ROCM-21512] [ALMIOPEN-1814] Fix MIOpen ASan memory leaks#7008
NolanHannaAMD merged 7 commits into
developfrom
users/nhanna/fix-miopen-asan-memory-leaks

Conversation

@NolanHannaAMD
Copy link
Copy Markdown
Contributor

@NolanHannaAMD NolanHannaAMD commented May 1, 2026

Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of the tests were blacklisted and others were not. This change looks to fix all of the low hanging fruit, which are the majority of the leaks found. This includes all of the critical leaks (>100MB) that were reported. Some other leaks were identified as needing a larger refactor to resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit virtual memory area limit errors. These changes are to enable running them with ASan, but only with a subset of the tests in order to not hit limits. The test coverage being lost is fairly negligible and the full tests are still run without ASan enabled.

Technical Details

Here is a summary of the files looked at and the changes made:

File Status Notes
test/gtest/gpu_mha_forward.cpp Fixed Added miopenDestroySolution() loop after using solutions. Fixes MHA forward solution descriptor leaks (report #11, #13).
test/gtest/gpu_mha_backward.cpp Fixed Added miopenDestroySolution() loop after using solutions. Fixes MHA backward solution descriptor leaks (report #9, #12).
test/gtest/mha_find20.cpp Fixed Added miopenDestroySolution() loop in both MhaForward and MhaBackward tests. Fixes MHA Find2.0 solution leak (report #16).
test/gtest/gtest_desc_guard.hpp Fixed New shared header introduced by the refactor. Provides a single DescGuard<DescType, CreateFn, DestroyFn> template (with TensorDescGuard, ConvDescGuard, DropoutDescGuard, RNNDescGuard aliases), a HandleGuard RAII wrapper for miopenHandle_t, and the DestroyInternalRnnDropoutDesc(rnnDesc) helper used by every RNN/LSTM/GRU test to free the internal DropoutDescriptor that miopenCreateRNNDescriptor allocates and miopenSetRNNDescriptor* then leaks. Replaces the per-file ad-hoc guard structs from the initial implementation.
test/gtest/w_supertensor.cpp Fixed Switched raw descriptors to the shared RNNDescGuard / TensorDescGuard from gtest_desc_guard.hpp. Added a class-local DestroyDropoutDesc() (called from TearDown and before miopenSetRNNDescriptor) to prevent the miopenSetRNNDescriptor overwrite leak. Reduced test parameter space under ASan to avoid OOM. Removed unused seqLen parameter and dead param_dev_out/bias_dev_out allocations.
test/gtest/lstm.hpp Fixed Switched rnnDescRNNDescGuard, DropoutDescDropoutDescGuard, and mio_handleHandleGuard (which now owns the miopenDestroy call), all from the shared gtest_desc_guard.hpp. Hoisted dropout_state_buf so it can be hipFreed at the end of the dropout path. Added DestroyInternalRnnDropoutDesc(rnnDesc) before miopenSetRNNDescriptor* and (in the non-dropout path only) at end of Run, which frees the internal DropoutDescriptor that the Set call would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB non-dropout descriptor leaks (report #3, #4, #6, #17-#22).
test/gtest/gru_test.cpp Fixed In the GRU test class: switched rnnDescRNNDescGuard, DropoutDescDropoutDescGuard, mio_handleHandleGuard, and added DestroyInternalRnnDropoutDesc(rnnDesc) before miopenSetRNNDescriptor* and (non-dropout path only) at end of Run plus hipFree(dropout_state_buf) for the dropout path. In the in-file GRUFwdCPUVerify / GRUBwdDataCPUVerify helpers: converted the raw dropout_inputTensor / dropout_outputTensor declarations to TensorDescGuard (mirroring the cpu_rnn.hpp change for the LSTM/RNN helpers).
test/gtest/softmax_find20.cpp Fixed Changed Finalize() to take the std::vector<miopenSolution_t>& and destroy each solution via miopenDestroySolution() before destroying the problem. Updated all 6 TEST(...) callers to pass the solutions vector. Fixes Find2.0 softmax solution/kernel leaks (report #25-#27).
test/gtest/rnn_seq_api.hpp Fixed Hoisted dropout_state_buf so the dropout path can hipFree it at the end. Added DestroyInternalRnnDropoutDesc(&rnnDesc) before miopenSetRNNDescriptor* and (non-dropout path only) at end of the run to free the internal DropoutDescriptor allocations leaked by RNNDescriptor copy-assignment. Same pattern as the LSTM/GRU fixes (report #29-#30).
test/cpu_rnn.hpp Fixed Converted 6 raw miopenTensorDescriptor_t dropout_input/outputTensor declarations across the LSTM/RNN CPU verification helpers (LSTMFwdCPUVerify, LSTMBwdDataCPUVerify, RNNFwdTrainCPUVerify, RNNBwdDataCPUVerify, GRUFwdCPUVerify, GRUBwdDataCPUVerify) to the shared TensorDescGuard from gtest_desc_guard.hpp. Removed the redundant miopenCreateTensorDescriptor calls and updated 12 miopen::deref(...) sites to .get(). (Note: the GRU helpers in this header are stale duplicates; the live ones are inside test/gtest/gru_test.cpp and were updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor leaks (report #1, #2, #14, #15, #32, #33).
test/gtest/rnn_vanilla_common.hpp Fixed Added DestroyInternalRnnDropoutDesc(rnnDesc) calls before miopenSetRNNDescriptor* and (non-dropout path only) at end of the run. The RNNDescGuard / DropoutDescGuard usage was already in place from an earlier commit and now resolves to the shared definitions in gtest_desc_guard.hpp. Same pattern as LSTM/GRU/rnn_seq_api fixes (report #14, #15, #32, #33).
test/gtest/graphapi_gtest_common.hpp Skipped File no longer exists in the codebase. The GraphApi test infrastructure has been removed; only the leak report (a stale snapshot) still references it. No fix possible against the current source tree.
test/gtest/graphapi_execution_plan.cpp Skipped File no longer exists in the codebase (GraphApi removed). The leak it represented was largely an external hipblaslt bug anyway; the remaining test-side portion is not fixable against the current source.
test/gtest/na_train.cpp / na_inference.cpp / na_*_find2.cpp Skipped Leaks come from the internal MIOpen solver/kernel cache living on the global singleton handle, which is never destroyed. Not easily fixable without redesigning global handle lifecycle. Should be suppressed in the ASan suppression file.
hipblaslt / rocblaslt (external) Skipped SolutionCache::addKernel and preloadCustomKernels leak via _rocblaslt_handle constructor. Called from miopen::Handle::CreateHipblasLtHandle. This is an upstream bug in hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that initialize a handle (report Category 2, Category 7).
CLR / HIP runtime (external) Skipped amd::Context and amd::roc::Device global initialization leaks from rocclr/platform/context.cpp. HIP runtime internals, not fixable in MIOpen.
src/hipoc/hipoc_program.cpp Skipped HIPOCProgramImpl objects leak during kernel compilation/caching (line ~178). This is an internal MIOpen kernel cache lifecycle issue that requires deeper architectural changes to fix. Contributes small amounts to MHA and Softmax Find2.0 leaks.
test/gtest/conv_api.cpp Fixed Already clean against current source — miopenDestroyConvolutionDescriptor(conv_desc) call exists (line 24) inside the test loop. ASAN run reports no leaks. The leak report was based on a stale snapshot. The hipblaslt handle init portion is tracked under the external-skipped row.
test/gtest/log_test.cpp (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) Fixed Already clean against current source — Tensor, Conv, CreateCBAFusionPlan, CreateBNormFusionPlan all have proper destructors that call the corresponding miopenDestroy* APIs in log.cpp. ASAN run on CPU_LOG_TEST_* (11 tests across log_test.cpp + log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is tracked separately under the external-skipped row.
test/gtest/fusion_test.cpp (CPU_FusionCreateOpConvForward) Fixed File renamed from fusion.cpp to fusion_test.cpp. Already clean against current source — uses TensorDescGuard/ConvDescGuard for tensor/conv descriptors and calls miopenDestroyFusionPlan(fusionPlanDesc) on the fusion plan (line 195). ASAN run on CPU_FusionCreateOpConvForward_FP32.* reports no leaks.
test/gtest/deterministic_conv_api.cpp Fixed Already clean against current source — uses ConvDescGuard (line 66) for the conv descriptor. ASAN run on *CPU_DeterministicConvApi* reports no leaks.
test/gtest/fusion_aux.cpp (GPU_FusionAux) Fixed Already clean against current source — uses ConvDescGuard plus stack-allocated internal C++ objects (miopen::TensorDescriptor, miopen::FusionPlanDescriptor) which have proper destructors. The convoOp handle is owned by the fusion plan. ASAN run on *GPU_FusionAux* reports no leaks.
test/gtest/backend_api.cpp (CPU_BackendApi) Skipped File no longer exists in the codebase. The backend API test infrastructure (part of the removed GraphApi suite) was removed; no fix possible against the current source tree.

High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)

  • DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template parameterized on the descriptor type and its create/destroy entry points. Aliases provide TensorDescGuard, ConvDescGuard, DropoutDescGuard, and RNNDescGuard, replacing the four near-identical guard structs
    that were copy-pasted across test files in the initial implementation.
  • HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse the template because miopenCreateWithStream takes an extra hipStream_t argument). Supports lazy create(stream) so callers that only need a handle in the dropout branch can default-construct one and
    populate it conditionally.
  • DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal DropoutDescriptor that miopenCreateRNNDescriptor allocates and that miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline blocks that LSTM/GRU/RNN tests were each carrying. The header documents the
    two call-sites: before each Set* (always safe) and at end-of-run only on the non-dropout path (the dropout path aliases the user-owned descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor

  • The "leak from Set* overwriting the default-constructed internal dropout descriptor" fix collapsed from per-file code to a one-line helper call, applied uniformly across lstm.hpp, gru_test.cpp, rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
  • mio_handle ownership in LSTM/GRU is now expressed via HandleGuard rather than a manual miopenDestroy at the end of the dropout branch — eliminates a class of forgotten-cleanup bugs.
  • dropout_state_buf is consistently hoisted out of the dropout if block so an end-of-run hipFree can release it; deletion of the buffer pairs visibly with its allocation.

Notable non-RNN change

  • softmax_find20.cpp was the only Find2.0 leak fix in this commit: Finalize() now takes the solutions vector and calls miopenDestroySolution() for each before destroying the problem. Same shape applied to all 6 tests in the file.

Test Plan

Run the tests beforehand to observe the ASan leak errors and then again afterward to verify the fixes have resolved the problem.

Test Result

List from ROCM-21512:

# Test Name Status Leak Status
1 Smoke/GPU_RNNVanillaDropout_FP32 PASSED (4 tests) No leaks
2 Smoke/GPU_RNNVanillaDropout_FP16 PASSED (4 tests) No leaks
3 Full/GPU_LSTM_dropout_FP32 PASSED (4 tests) No leaks
4 Full/GPU_LSTM_dropout_FP16 PASSED (4 tests) No leaks
5 CPU_GraphApiExecutionPlanBuilder_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
6 Full/GPU_LSTM_dropout_FP64 REMOVED (PR #5750, 2026-03-26) n/a (deleted)
7 Unit/CPU_GraphApiPointwise_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
8 Full/GPU_LstmMSRnn_FP32 PASSED (1152 tests) No leaks
9 Smoke/GPU_Bwd_Mha_FP32 PASSED (12 tests) No leaks
10 Full/GPU_LstmMSRnn_FP16 PASSED (864 tests) No leaks
11 Smoke/GPU_Fwd_Mha_FP32 PASSED (15 tests) No leaks
12 Full/GPU_Bwd_Mha_FP32 PASSED (6 tests) No leaks
13 Full/GPU_Fwd_Mha_FP32 PASSED (7 tests) No leaks
14 Full/GPU_RNNVanilla_FP32 PASSED (96 tests) No leaks
15 Full/GPU_RNNVanilla_FP16 PASSED (96 tests) No leaks
16 GPU_TestMhaFind20_FP32 PASSED (2 tests) No leaks
17 Full/GPU_LSTM_FP32 PASSED (32 tests) No leaks
18 Full/GPU_LSTM_FP16 PASSED (32 tests) No leaks
19 Full/GPU_LSTM_extra_FP32 PASSED (30 tests) No leaks
20 Full/GPU_LSTM_extra_FP16 PASSED (30 tests) No leaks
21 Full/GPU_DeepBench_LSTM_FP16 PASSED (22 tests) No leaks
22 Full/GPU_DeepBench_LSTM_FP32 PASSED (22 tests) No leaks
23 CPU_LOG_TEST_FUSION_NONE PASSED (2 tests) No leaks
24 CPU_LOG_TEST_NEG_NONE PASSED (4 tests) No leaks
25 GPU_SoftmaxFind20_BFP16 PASSED (2 tests) No leaks
26 GPU_SoftmaxFind20_FP16 PASSED (2 tests) No leaks
27 GPU_SoftmaxFind20_FP32 PASSED (2 tests) No leaks
28 CPU_ConvApi_NONE PASSED (1 test) No leaks
29 Full/GPU_RNNSeqApi_FP16 PASSED (16 tests) No leaks
30 Full/GPU_RNNSeqApi_FP32 PASSED (16 tests) No leaks
31 UnitVAN/CPU_GraphApiRng_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
32 Smoke/GPU_RNNVanilla_FP16 PASSED (4 tests) No leaks
33 Smoke/GPU_RNNVanilla_FP32 PASSED (4 tests) No leaks
34 CPU_FusionCreateOpConvForward_FP32 PASSED (1 test) No leaks
35 CPU_GraphApiOperationReduction_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
36 Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
37 Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput REMOVED (PR #5603, 2026-03-26) n/a (deleted)
38 Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
39 Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput REMOVED (PR #5603, 2026-03-26) n/a (deleted)
40 UnitVAB/CPU_GraphApiRng_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
41 Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) (see #2) (see #2)
42 CPU_GraphApiOperationGraphDescriptor_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
43 UnitVA/CPU_GraphApiVariantPack_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
44 UnitVAU/CPU_GraphApiRng_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
45 CPU_GraphApiOperationReshape_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
46 Smoke/CPU_DeterministicConvApi_NONE PASSED (1 test) No leaks
47 Smoke/GPU_FusionAux_FP32 PASSED (1 test) No leaks
48 CPU_GraphApiEngineHeur_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
49 Unit/CPU_GraphApiReduction_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
50 CPU_GraphApiEngineCfg_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
51 Unit/CPU_GraphApiMatMul_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
52 CPU_BackendApi_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
53 UnitIV/CPU_GraphApiOperationPointwiseOneInput REMOVED (PR #5603, 2026-03-26) n/a (deleted)
54 Unit3IV/CPU_GraphApiOperationPointwiseThreeInput REMOVED (PR #5603, 2026-03-26) n/a (deleted)
55 UnitVA/CPU_GraphApiOperationMatmul_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
56 UnitVA/CPU_GraphApiOperationRng_NONE REMOVED (PR #5603, 2026-03-26) n/a (deleted)
Outcome Count
Passed, no leaks 39
Passed, leaks detected 0
Failed 0
Crashed / timed out 0
Removed — GraphAPI purge (#5603) 15
Removed — FP64 LSTM purge (#5750) 1
Duplicate (not re-run) 1
Total rows 56

Risk Assessment

Low

…rom the blacklist. Exception being the supertensor tests, where the leak was fixed, but it hits an out of memory error due to ASan using significantly more memory. It currently remains disabled.
…hat they can be run with Address Sanitizer enabled.
Comment thread projects/miopen/test/gtest/gtest_desc_guard.hpp Outdated
@NolanHannaAMD NolanHannaAMD changed the title [MIOpen] Fix MIOpen ASan memory leaks [MIOpen] [ROCM-21512] [ALMIOPEN-1814] Fix MIOpen ASan memory leaks May 8, 2026
@NolanHannaAMD NolanHannaAMD merged commit 12a59bb into develop May 8, 2026
37 checks passed
@NolanHannaAMD NolanHannaAMD deleted the users/nhanna/fix-miopen-asan-memory-leaks branch May 8, 2026 20:15
aledudek pushed a commit that referenced this pull request May 20, 2026
…7008)

## Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of
the tests were blacklisted and others were not. This change looks to fix
all of the low hanging fruit, which are the majority of the leaks found.
This includes all of the critical leaks (>100MB) that were reported.
Some other leaks were identified as needing a larger refactor to
resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit
virtual memory area limit errors. These changes are to enable running
them with ASan, but only with a subset of the tests in order to not hit
limits. The test coverage being lost is fairly negligible and the full
tests are still run without ASan enabled.

## Technical Details

Here is a summary of the files looked at and the changes made:

| File | Status | Notes |
|------|--------|-------|
| `test/gtest/gpu_mha_forward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA forward
solution descriptor leaks (report #11, #13). |
| `test/gtest/gpu_mha_backward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA backward
solution descriptor leaks (report #9, #12). |
| `test/gtest/mha_find20.cpp` | Fixed | Added `miopenDestroySolution()`
loop in both `MhaForward` and `MhaBackward` tests. Fixes MHA Find2.0
solution leak (report #16). |
| `test/gtest/gtest_desc_guard.hpp` | Fixed | New shared header
introduced by the refactor. Provides a single `DescGuard<DescType,
CreateFn, DestroyFn>` template (with `TensorDescGuard`, `ConvDescGuard`,
`DropoutDescGuard`, `RNNDescGuard` aliases), a `HandleGuard` RAII
wrapper for `miopenHandle_t`, and the
`DestroyInternalRnnDropoutDesc(rnnDesc)` helper used by every
RNN/LSTM/GRU test to free the internal `DropoutDescriptor` that
`miopenCreateRNNDescriptor` allocates and `miopenSetRNNDescriptor*` then
leaks. Replaces the per-file ad-hoc guard structs from the initial
implementation. |
| `test/gtest/w_supertensor.cpp` | Fixed | Switched raw descriptors to
the shared `RNNDescGuard` / `TensorDescGuard` from
`gtest_desc_guard.hpp`. Added a class-local `DestroyDropoutDesc()`
(called from `TearDown` and before `miopenSetRNNDescriptor`) to prevent
the `miopenSetRNNDescriptor` overwrite leak. Reduced test parameter
space under ASan to avoid OOM. Removed unused `seqLen` parameter and
dead `param_dev_out`/`bias_dev_out` allocations. |
| `test/gtest/lstm.hpp` | Fixed | Switched `rnnDesc` → `RNNDescGuard`,
`DropoutDesc` → `DropoutDescGuard`, and `mio_handle` → `HandleGuard`
(which now owns the `miopenDestroy` call), all from the shared
`gtest_desc_guard.hpp`. Hoisted `dropout_state_buf` so it can be
`hipFree`d at the end of the dropout path. Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (in the non-dropout path only) at end of
`Run`, which frees the internal `DropoutDescriptor` that the Set call
would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB
non-dropout descriptor leaks (report #3, #4, #6, #17-#22). |
| `test/gtest/gru_test.cpp` | Fixed | In the GRU test class: switched
`rnnDesc` → `RNNDescGuard`, `DropoutDesc` → `DropoutDescGuard`,
`mio_handle` → `HandleGuard`, and added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of `Run`
plus `hipFree(dropout_state_buf)` for the dropout path. In the in-file
`GRUFwdCPUVerify` / `GRUBwdDataCPUVerify` helpers: converted the raw
`dropout_inputTensor` / `dropout_outputTensor` declarations to
`TensorDescGuard` (mirroring the `cpu_rnn.hpp` change for the LSTM/RNN
helpers). |
| `test/gtest/softmax_find20.cpp` | Fixed | Changed `Finalize()` to take
the `std::vector<miopenSolution_t>&` and destroy each solution via
`miopenDestroySolution()` before destroying the problem. Updated all 6
`TEST(...)` callers to pass the solutions vector. Fixes Find2.0 softmax
solution/kernel leaks (report #25-#27). |
| `test/gtest/rnn_seq_api.hpp` | Fixed | Hoisted `dropout_state_buf` so
the dropout path can `hipFree` it at the end. Added
`DestroyInternalRnnDropoutDesc(&rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run
to free the internal `DropoutDescriptor` allocations leaked by
`RNNDescriptor` copy-assignment. Same pattern as the LSTM/GRU fixes
(report #29-#30). |
| `test/cpu_rnn.hpp` | Fixed | Converted 6 raw `miopenTensorDescriptor_t
dropout_input/outputTensor` declarations across the LSTM/RNN CPU
verification helpers (`LSTMFwdCPUVerify`, `LSTMBwdDataCPUVerify`,
`RNNFwdTrainCPUVerify`, `RNNBwdDataCPUVerify`, `GRUFwdCPUVerify`,
`GRUBwdDataCPUVerify`) to the shared `TensorDescGuard` from
`gtest_desc_guard.hpp`. Removed the redundant
`miopenCreateTensorDescriptor` calls and updated 12 `miopen::deref(...)`
sites to `.get()`. (Note: the GRU helpers in this header are stale
duplicates; the live ones are inside `test/gtest/gru_test.cpp` and were
updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor
leaks (report #1, #2, #14, #15, #32, #33). |
| `test/gtest/rnn_vanilla_common.hpp` | Fixed | Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` calls before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run.
The `RNNDescGuard` / `DropoutDescGuard` usage was already in place from
an earlier commit and now resolves to the shared definitions in
`gtest_desc_guard.hpp`. Same pattern as LSTM/GRU/rnn_seq_api fixes
(report #14, #15, #32, #33). |
| `test/gtest/graphapi_gtest_common.hpp` | Skipped | File no longer
exists in the codebase. The GraphApi test infrastructure has been
removed; only the leak report (a stale snapshot) still references it. No
fix possible against the current source tree. |
| `test/gtest/graphapi_execution_plan.cpp` | Skipped | File no longer
exists in the codebase (GraphApi removed). The leak it represented was
largely an external hipblaslt bug anyway; the remaining test-side
portion is not fixable against the current source. |
| `test/gtest/na_train.cpp` / `na_inference.cpp` / `na_*_find2.cpp` |
Skipped | Leaks come from the internal MIOpen solver/kernel cache living
on the global singleton handle, which is never destroyed. Not easily
fixable without redesigning global handle lifecycle. Should be
suppressed in the ASan suppression file. |
| hipblaslt / rocblaslt (external) | Skipped |
`SolutionCache::addKernel` and `preloadCustomKernels` leak via
`_rocblaslt_handle` constructor. Called from
`miopen::Handle::CreateHipblasLtHandle`. This is an upstream bug in
hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that
initialize a handle (report Category 2, Category 7). |
| CLR / HIP runtime (external) | Skipped | `amd::Context` and
`amd::roc::Device` global initialization leaks from
`rocclr/platform/context.cpp`. HIP runtime internals, not fixable in
MIOpen. |
| `src/hipoc/hipoc_program.cpp` | Skipped | `HIPOCProgramImpl` objects
leak during kernel compilation/caching (line ~178). This is an internal
MIOpen kernel cache lifecycle issue that requires deeper architectural
changes to fix. Contributes small amounts to MHA and Softmax Find2.0
leaks. |
| `test/gtest/conv_api.cpp` | Fixed | Already clean against current
source — `miopenDestroyConvolutionDescriptor(conv_desc)` call exists
(line 24) inside the test loop. ASAN run reports no leaks. The leak
report was based on a stale snapshot. The hipblaslt handle init portion
is tracked under the external-skipped row. |
| `test/gtest/log_test.cpp` (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) |
Fixed | Already clean against current source — `Tensor`, `Conv`,
`CreateCBAFusionPlan`, `CreateBNormFusionPlan` all have proper
destructors that call the corresponding `miopenDestroy*` APIs in
`log.cpp`. ASAN run on `CPU_LOG_TEST_*` (11 tests across log_test.cpp +
log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is
tracked separately under the external-skipped row. |
| `test/gtest/fusion_test.cpp` (CPU_FusionCreateOpConvForward) | Fixed |
File renamed from `fusion.cpp` to `fusion_test.cpp`. Already clean
against current source — uses `TensorDescGuard`/`ConvDescGuard` for
tensor/conv descriptors and calls
`miopenDestroyFusionPlan(fusionPlanDesc)` on the fusion plan (line 195).
ASAN run on `CPU_FusionCreateOpConvForward_FP32.*` reports no leaks. |
| `test/gtest/deterministic_conv_api.cpp` | Fixed | Already clean
against current source — uses `ConvDescGuard` (line 66) for the conv
descriptor. ASAN run on `*CPU_DeterministicConvApi*` reports no leaks. |
| `test/gtest/fusion_aux.cpp` (GPU_FusionAux) | Fixed | Already clean
against current source — uses `ConvDescGuard` plus stack-allocated
internal C++ objects (`miopen::TensorDescriptor`,
`miopen::FusionPlanDescriptor`) which have proper destructors. The
`convoOp` handle is owned by the fusion plan. ASAN run on
`*GPU_FusionAux*` reports no leaks. |
| `test/gtest/backend_api.cpp` (CPU_BackendApi) | Skipped | File no
longer exists in the codebase. The backend API test infrastructure (part
of the removed GraphApi suite) was removed; no fix possible against the
current source tree. |


### High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)
- DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template
parameterized on the descriptor type and its create/destroy entry
points. Aliases provide TensorDescGuard, ConvDescGuard,
DropoutDescGuard, and RNNDescGuard, replacing the four near-identical
guard structs
  that were copy-pasted across test files in the initial implementation.
- HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse
the template because miopenCreateWithStream takes an extra hipStream_t
argument). Supports lazy create(stream) so callers that only need a
handle in the dropout branch can default-construct one and
  populate it conditionally.
- DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal
DropoutDescriptor that miopenCreateRNNDescriptor allocates and that
miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline
blocks that LSTM/GRU/RNN tests were each carrying. The header documents
the
two call-sites: before each Set* (always safe) and at end-of-run only on
the non-dropout path (the dropout path aliases the user-owned
descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor
- The "leak from Set* overwriting the default-constructed internal
dropout descriptor" fix collapsed from per-file code to a one-line
helper call, applied uniformly across lstm.hpp, gru_test.cpp,
rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
- mio_handle ownership in LSTM/GRU is now expressed via HandleGuard
rather than a manual miopenDestroy at the end of the dropout branch —
eliminates a class of forgotten-cleanup bugs.
- dropout_state_buf is consistently hoisted out of the dropout if block
so an end-of-run hipFree can release it; deletion of the buffer pairs
visibly with its allocation.

Notable non-RNN change
- softmax_find20.cpp was the only Find2.0 leak fix in this commit:
Finalize() now takes the solutions vector and calls
miopenDestroySolution() for each before destroying the problem. Same
shape applied to all 6 tests in the file.


## Test Plan

Run the tests beforehand to observe the ASan leak errors and then again
afterward to verify the fixes have resolved the problem.

## Test Result

List from ROCM-21512:

| # | Test Name | Status | Leak Status |

|---:|------------------------------------------------------|------------------------------------------------|-----------------|
| 1 | Smoke/GPU_RNNVanillaDropout_FP32 | PASSED (4 tests) | No leaks |
| 2 | Smoke/GPU_RNNVanillaDropout_FP16 | PASSED (4 tests) | No leaks |
| 3 | Full/GPU_LSTM_dropout_FP32 | PASSED (4 tests) | No leaks |
| 4 | Full/GPU_LSTM_dropout_FP16 | PASSED (4 tests) | No leaks |
| 5 | CPU_GraphApiExecutionPlanBuilder_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 6 | Full/GPU_LSTM_dropout_FP64 | REMOVED (PR #5750, 2026-03-26) | n/a
(deleted) |
| 7 | Unit/CPU_GraphApiPointwise_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 8 | Full/GPU_LstmMSRnn_FP32 | PASSED (1152 tests) | No leaks |
| 9 | Smoke/GPU_Bwd_Mha_FP32 | PASSED (12 tests) | No leaks |
| 10 | Full/GPU_LstmMSRnn_FP16 | PASSED (864 tests) | No leaks |
| 11 | Smoke/GPU_Fwd_Mha_FP32 | PASSED (15 tests) | No leaks |
| 12 | Full/GPU_Bwd_Mha_FP32 | PASSED (6 tests) | No leaks |
| 13 | Full/GPU_Fwd_Mha_FP32 | PASSED (7 tests) | No leaks |
| 14 | Full/GPU_RNNVanilla_FP32 | PASSED (96 tests) | No leaks |
| 15 | Full/GPU_RNNVanilla_FP16 | PASSED (96 tests) | No leaks |
| 16 | GPU_TestMhaFind20_FP32 | PASSED (2 tests) | No leaks |
| 17 | Full/GPU_LSTM_FP32 | PASSED (32 tests) | No leaks |
| 18 | Full/GPU_LSTM_FP16 | PASSED (32 tests) | No leaks |
| 19 | Full/GPU_LSTM_extra_FP32 | PASSED (30 tests) | No leaks |
| 20 | Full/GPU_LSTM_extra_FP16 | PASSED (30 tests) | No leaks |
| 21 | Full/GPU_DeepBench_LSTM_FP16 | PASSED (22 tests) | No leaks |
| 22 | Full/GPU_DeepBench_LSTM_FP32 | PASSED (22 tests) | No leaks |
| 23 | CPU_LOG_TEST_FUSION_NONE | PASSED (2 tests) | No leaks |
| 24 | CPU_LOG_TEST_NEG_NONE | PASSED (4 tests) | No leaks |
| 25 | GPU_SoftmaxFind20_BFP16 | PASSED (2 tests) | No leaks |
| 26 | GPU_SoftmaxFind20_FP16 | PASSED (2 tests) | No leaks |
| 27 | GPU_SoftmaxFind20_FP32 | PASSED (2 tests) | No leaks |
| 28 | CPU_ConvApi_NONE | PASSED (1 test) | No leaks |
| 29 | Full/GPU_RNNSeqApi_FP16 | PASSED (16 tests) | No leaks |
| 30 | Full/GPU_RNNSeqApi_FP32 | PASSED (16 tests) | No leaks |
| 31 | UnitVAN/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 32 | Smoke/GPU_RNNVanilla_FP16 | PASSED (4 tests) | No leaks |
| 33 | Smoke/GPU_RNNVanilla_FP32 | PASSED (4 tests) | No leaks |
| 34 | CPU_FusionCreateOpConvForward_FP32 | PASSED (1 test) | No leaks |
| 35 | CPU_GraphApiOperationReduction_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 36 | Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 37 | Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 38 | Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 39 | Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 40 | UnitVAB/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 41 | Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) | (see #2) |
(see #2) |
| 42 | CPU_GraphApiOperationGraphDescriptor_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 43 | UnitVA/CPU_GraphApiVariantPack_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 44 | UnitVAU/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 45 | CPU_GraphApiOperationReshape_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 46 | Smoke/CPU_DeterministicConvApi_NONE | PASSED (1 test) | No leaks
|
| 47 | Smoke/GPU_FusionAux_FP32 | PASSED (1 test) | No leaks |
| 48 | CPU_GraphApiEngineHeur_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 49 | Unit/CPU_GraphApiReduction_NONE | REMOVED (PR #5603, 2026-03-26)
| n/a (deleted) |
| 50 | CPU_GraphApiEngineCfg_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 51 | Unit/CPU_GraphApiMatMul_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 52 | CPU_BackendApi_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 53 | UnitIV/CPU_GraphApiOperationPointwiseOneInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 54 | Unit3IV/CPU_GraphApiOperationPointwiseThreeInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 55 | UnitVA/CPU_GraphApiOperationMatmul_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 56 | UnitVA/CPU_GraphApiOperationRng_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |


| Outcome                            | Count |
|------------------------------------|------:|
| Passed, no leaks                   |    39 |
| Passed, leaks detected             |     0 |
| Failed                             |     0 |
| Crashed / timed out                |     0 |
| Removed — GraphAPI purge (#5603)   |    15 |
| Removed — FP64 LSTM purge (#5750)  |     1 |
| Duplicate (not re-run)             |     1 |
| **Total rows**                     |  **56** |

## Risk Assessment
Low

---------

Co-authored-by: JonathanLichtnerAMD <195780826+JonathanLichtnerAMD@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants