Skip to content

[MIOpen] Remove fp64 lstm tests as not supported#5750

Merged
Vsevolod1983 merged 3 commits into
developfrom
users/Vsevolod1983/RemoveFP64ForLstm
Mar 26, 2026
Merged

[MIOpen] Remove fp64 lstm tests as not supported#5750
Vsevolod1983 merged 3 commits into
developfrom
users/Vsevolod1983/RemoveFP64ForLstm

Conversation

@Vsevolod1983
Copy link
Copy Markdown
Contributor

We have failures on CI with FP64 LSTM tests and this is correct because RNNDescriptor supports only float and half types.

@Vsevolod1983 Vsevolod1983 enabled auto-merge (squash) March 26, 2026 10:01
@Vsevolod1983 Vsevolod1983 merged commit eb9f494 into develop Mar 26, 2026
39 checks passed
@Vsevolod1983 Vsevolod1983 deleted the users/Vsevolod1983/RemoveFP64ForLstm branch March 26, 2026 12:28
@randyspauldingamd randyspauldingamd mentioned this pull request Mar 31, 2026
1 task
randyspauldingamd added a commit that referenced this pull request Apr 1, 2026
## Motivation

RNN does not support fp64. This one was missed in the first PR:
#5750

## Technical Details

There is no requirement to support this. I was told the 64-bit tests
were added by mistake.

## Test Plan

Pass CI.

## Test Result

Pending.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
vidyasagar-amd pushed a commit that referenced this pull request Apr 9, 2026
## Motivation

RNN does not support fp64. This one was missed in the first PR:
#5750

## Technical Details

There is no requirement to support this. I was told the 64-bit tests
were added by mistake.

## Test Plan

Pass CI.

## Test Result

Pending.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
NolanHannaAMD added a commit that referenced this pull request May 8, 2026
…7008)

## Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of
the tests were blacklisted and others were not. This change looks to fix
all of the low hanging fruit, which are the majority of the leaks found.
This includes all of the critical leaks (>100MB) that were reported.
Some other leaks were identified as needing a larger refactor to
resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit
virtual memory area limit errors. These changes are to enable running
them with ASan, but only with a subset of the tests in order to not hit
limits. The test coverage being lost is fairly negligible and the full
tests are still run without ASan enabled.

## Technical Details

Here is a summary of the files looked at and the changes made:

| File | Status | Notes |
|------|--------|-------|
| `test/gtest/gpu_mha_forward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA forward
solution descriptor leaks (report #11, #13). |
| `test/gtest/gpu_mha_backward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA backward
solution descriptor leaks (report #9, #12). |
| `test/gtest/mha_find20.cpp` | Fixed | Added `miopenDestroySolution()`
loop in both `MhaForward` and `MhaBackward` tests. Fixes MHA Find2.0
solution leak (report #16). |
| `test/gtest/gtest_desc_guard.hpp` | Fixed | New shared header
introduced by the refactor. Provides a single `DescGuard<DescType,
CreateFn, DestroyFn>` template (with `TensorDescGuard`, `ConvDescGuard`,
`DropoutDescGuard`, `RNNDescGuard` aliases), a `HandleGuard` RAII
wrapper for `miopenHandle_t`, and the
`DestroyInternalRnnDropoutDesc(rnnDesc)` helper used by every
RNN/LSTM/GRU test to free the internal `DropoutDescriptor` that
`miopenCreateRNNDescriptor` allocates and `miopenSetRNNDescriptor*` then
leaks. Replaces the per-file ad-hoc guard structs from the initial
implementation. |
| `test/gtest/w_supertensor.cpp` | Fixed | Switched raw descriptors to
the shared `RNNDescGuard` / `TensorDescGuard` from
`gtest_desc_guard.hpp`. Added a class-local `DestroyDropoutDesc()`
(called from `TearDown` and before `miopenSetRNNDescriptor`) to prevent
the `miopenSetRNNDescriptor` overwrite leak. Reduced test parameter
space under ASan to avoid OOM. Removed unused `seqLen` parameter and
dead `param_dev_out`/`bias_dev_out` allocations. |
| `test/gtest/lstm.hpp` | Fixed | Switched `rnnDesc` → `RNNDescGuard`,
`DropoutDesc` → `DropoutDescGuard`, and `mio_handle` → `HandleGuard`
(which now owns the `miopenDestroy` call), all from the shared
`gtest_desc_guard.hpp`. Hoisted `dropout_state_buf` so it can be
`hipFree`d at the end of the dropout path. Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (in the non-dropout path only) at end of
`Run`, which frees the internal `DropoutDescriptor` that the Set call
would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB
non-dropout descriptor leaks (report #3, #4, #6, #17-#22). |
| `test/gtest/gru_test.cpp` | Fixed | In the GRU test class: switched
`rnnDesc` → `RNNDescGuard`, `DropoutDesc` → `DropoutDescGuard`,
`mio_handle` → `HandleGuard`, and added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of `Run`
plus `hipFree(dropout_state_buf)` for the dropout path. In the in-file
`GRUFwdCPUVerify` / `GRUBwdDataCPUVerify` helpers: converted the raw
`dropout_inputTensor` / `dropout_outputTensor` declarations to
`TensorDescGuard` (mirroring the `cpu_rnn.hpp` change for the LSTM/RNN
helpers). |
| `test/gtest/softmax_find20.cpp` | Fixed | Changed `Finalize()` to take
the `std::vector<miopenSolution_t>&` and destroy each solution via
`miopenDestroySolution()` before destroying the problem. Updated all 6
`TEST(...)` callers to pass the solutions vector. Fixes Find2.0 softmax
solution/kernel leaks (report #25-#27). |
| `test/gtest/rnn_seq_api.hpp` | Fixed | Hoisted `dropout_state_buf` so
the dropout path can `hipFree` it at the end. Added
`DestroyInternalRnnDropoutDesc(&rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run
to free the internal `DropoutDescriptor` allocations leaked by
`RNNDescriptor` copy-assignment. Same pattern as the LSTM/GRU fixes
(report #29-#30). |
| `test/cpu_rnn.hpp` | Fixed | Converted 6 raw `miopenTensorDescriptor_t
dropout_input/outputTensor` declarations across the LSTM/RNN CPU
verification helpers (`LSTMFwdCPUVerify`, `LSTMBwdDataCPUVerify`,
`RNNFwdTrainCPUVerify`, `RNNBwdDataCPUVerify`, `GRUFwdCPUVerify`,
`GRUBwdDataCPUVerify`) to the shared `TensorDescGuard` from
`gtest_desc_guard.hpp`. Removed the redundant
`miopenCreateTensorDescriptor` calls and updated 12 `miopen::deref(...)`
sites to `.get()`. (Note: the GRU helpers in this header are stale
duplicates; the live ones are inside `test/gtest/gru_test.cpp` and were
updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor
leaks (report #1, #2, #14, #15, #32, #33). |
| `test/gtest/rnn_vanilla_common.hpp` | Fixed | Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` calls before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run.
The `RNNDescGuard` / `DropoutDescGuard` usage was already in place from
an earlier commit and now resolves to the shared definitions in
`gtest_desc_guard.hpp`. Same pattern as LSTM/GRU/rnn_seq_api fixes
(report #14, #15, #32, #33). |
| `test/gtest/graphapi_gtest_common.hpp` | Skipped | File no longer
exists in the codebase. The GraphApi test infrastructure has been
removed; only the leak report (a stale snapshot) still references it. No
fix possible against the current source tree. |
| `test/gtest/graphapi_execution_plan.cpp` | Skipped | File no longer
exists in the codebase (GraphApi removed). The leak it represented was
largely an external hipblaslt bug anyway; the remaining test-side
portion is not fixable against the current source. |
| `test/gtest/na_train.cpp` / `na_inference.cpp` / `na_*_find2.cpp` |
Skipped | Leaks come from the internal MIOpen solver/kernel cache living
on the global singleton handle, which is never destroyed. Not easily
fixable without redesigning global handle lifecycle. Should be
suppressed in the ASan suppression file. |
| hipblaslt / rocblaslt (external) | Skipped |
`SolutionCache::addKernel` and `preloadCustomKernels` leak via
`_rocblaslt_handle` constructor. Called from
`miopen::Handle::CreateHipblasLtHandle`. This is an upstream bug in
hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that
initialize a handle (report Category 2, Category 7). |
| CLR / HIP runtime (external) | Skipped | `amd::Context` and
`amd::roc::Device` global initialization leaks from
`rocclr/platform/context.cpp`. HIP runtime internals, not fixable in
MIOpen. |
| `src/hipoc/hipoc_program.cpp` | Skipped | `HIPOCProgramImpl` objects
leak during kernel compilation/caching (line ~178). This is an internal
MIOpen kernel cache lifecycle issue that requires deeper architectural
changes to fix. Contributes small amounts to MHA and Softmax Find2.0
leaks. |
| `test/gtest/conv_api.cpp` | Fixed | Already clean against current
source — `miopenDestroyConvolutionDescriptor(conv_desc)` call exists
(line 24) inside the test loop. ASAN run reports no leaks. The leak
report was based on a stale snapshot. The hipblaslt handle init portion
is tracked under the external-skipped row. |
| `test/gtest/log_test.cpp` (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) |
Fixed | Already clean against current source — `Tensor`, `Conv`,
`CreateCBAFusionPlan`, `CreateBNormFusionPlan` all have proper
destructors that call the corresponding `miopenDestroy*` APIs in
`log.cpp`. ASAN run on `CPU_LOG_TEST_*` (11 tests across log_test.cpp +
log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is
tracked separately under the external-skipped row. |
| `test/gtest/fusion_test.cpp` (CPU_FusionCreateOpConvForward) | Fixed |
File renamed from `fusion.cpp` to `fusion_test.cpp`. Already clean
against current source — uses `TensorDescGuard`/`ConvDescGuard` for
tensor/conv descriptors and calls
`miopenDestroyFusionPlan(fusionPlanDesc)` on the fusion plan (line 195).
ASAN run on `CPU_FusionCreateOpConvForward_FP32.*` reports no leaks. |
| `test/gtest/deterministic_conv_api.cpp` | Fixed | Already clean
against current source — uses `ConvDescGuard` (line 66) for the conv
descriptor. ASAN run on `*CPU_DeterministicConvApi*` reports no leaks. |
| `test/gtest/fusion_aux.cpp` (GPU_FusionAux) | Fixed | Already clean
against current source — uses `ConvDescGuard` plus stack-allocated
internal C++ objects (`miopen::TensorDescriptor`,
`miopen::FusionPlanDescriptor`) which have proper destructors. The
`convoOp` handle is owned by the fusion plan. ASAN run on
`*GPU_FusionAux*` reports no leaks. |
| `test/gtest/backend_api.cpp` (CPU_BackendApi) | Skipped | File no
longer exists in the codebase. The backend API test infrastructure (part
of the removed GraphApi suite) was removed; no fix possible against the
current source tree. |


### High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)
- DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template
parameterized on the descriptor type and its create/destroy entry
points. Aliases provide TensorDescGuard, ConvDescGuard,
DropoutDescGuard, and RNNDescGuard, replacing the four near-identical
guard structs
  that were copy-pasted across test files in the initial implementation.
- HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse
the template because miopenCreateWithStream takes an extra hipStream_t
argument). Supports lazy create(stream) so callers that only need a
handle in the dropout branch can default-construct one and
  populate it conditionally.
- DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal
DropoutDescriptor that miopenCreateRNNDescriptor allocates and that
miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline
blocks that LSTM/GRU/RNN tests were each carrying. The header documents
the
two call-sites: before each Set* (always safe) and at end-of-run only on
the non-dropout path (the dropout path aliases the user-owned
descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor
- The "leak from Set* overwriting the default-constructed internal
dropout descriptor" fix collapsed from per-file code to a one-line
helper call, applied uniformly across lstm.hpp, gru_test.cpp,
rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
- mio_handle ownership in LSTM/GRU is now expressed via HandleGuard
rather than a manual miopenDestroy at the end of the dropout branch —
eliminates a class of forgotten-cleanup bugs.
- dropout_state_buf is consistently hoisted out of the dropout if block
so an end-of-run hipFree can release it; deletion of the buffer pairs
visibly with its allocation.

Notable non-RNN change
- softmax_find20.cpp was the only Find2.0 leak fix in this commit:
Finalize() now takes the solutions vector and calls
miopenDestroySolution() for each before destroying the problem. Same
shape applied to all 6 tests in the file.


## Test Plan

Run the tests beforehand to observe the ASan leak errors and then again
afterward to verify the fixes have resolved the problem.

## Test Result

List from ROCM-21512:

| # | Test Name | Status | Leak Status |

|---:|------------------------------------------------------|------------------------------------------------|-----------------|
| 1 | Smoke/GPU_RNNVanillaDropout_FP32 | PASSED (4 tests) | No leaks |
| 2 | Smoke/GPU_RNNVanillaDropout_FP16 | PASSED (4 tests) | No leaks |
| 3 | Full/GPU_LSTM_dropout_FP32 | PASSED (4 tests) | No leaks |
| 4 | Full/GPU_LSTM_dropout_FP16 | PASSED (4 tests) | No leaks |
| 5 | CPU_GraphApiExecutionPlanBuilder_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 6 | Full/GPU_LSTM_dropout_FP64 | REMOVED (PR #5750, 2026-03-26) | n/a
(deleted) |
| 7 | Unit/CPU_GraphApiPointwise_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 8 | Full/GPU_LstmMSRnn_FP32 | PASSED (1152 tests) | No leaks |
| 9 | Smoke/GPU_Bwd_Mha_FP32 | PASSED (12 tests) | No leaks |
| 10 | Full/GPU_LstmMSRnn_FP16 | PASSED (864 tests) | No leaks |
| 11 | Smoke/GPU_Fwd_Mha_FP32 | PASSED (15 tests) | No leaks |
| 12 | Full/GPU_Bwd_Mha_FP32 | PASSED (6 tests) | No leaks |
| 13 | Full/GPU_Fwd_Mha_FP32 | PASSED (7 tests) | No leaks |
| 14 | Full/GPU_RNNVanilla_FP32 | PASSED (96 tests) | No leaks |
| 15 | Full/GPU_RNNVanilla_FP16 | PASSED (96 tests) | No leaks |
| 16 | GPU_TestMhaFind20_FP32 | PASSED (2 tests) | No leaks |
| 17 | Full/GPU_LSTM_FP32 | PASSED (32 tests) | No leaks |
| 18 | Full/GPU_LSTM_FP16 | PASSED (32 tests) | No leaks |
| 19 | Full/GPU_LSTM_extra_FP32 | PASSED (30 tests) | No leaks |
| 20 | Full/GPU_LSTM_extra_FP16 | PASSED (30 tests) | No leaks |
| 21 | Full/GPU_DeepBench_LSTM_FP16 | PASSED (22 tests) | No leaks |
| 22 | Full/GPU_DeepBench_LSTM_FP32 | PASSED (22 tests) | No leaks |
| 23 | CPU_LOG_TEST_FUSION_NONE | PASSED (2 tests) | No leaks |
| 24 | CPU_LOG_TEST_NEG_NONE | PASSED (4 tests) | No leaks |
| 25 | GPU_SoftmaxFind20_BFP16 | PASSED (2 tests) | No leaks |
| 26 | GPU_SoftmaxFind20_FP16 | PASSED (2 tests) | No leaks |
| 27 | GPU_SoftmaxFind20_FP32 | PASSED (2 tests) | No leaks |
| 28 | CPU_ConvApi_NONE | PASSED (1 test) | No leaks |
| 29 | Full/GPU_RNNSeqApi_FP16 | PASSED (16 tests) | No leaks |
| 30 | Full/GPU_RNNSeqApi_FP32 | PASSED (16 tests) | No leaks |
| 31 | UnitVAN/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 32 | Smoke/GPU_RNNVanilla_FP16 | PASSED (4 tests) | No leaks |
| 33 | Smoke/GPU_RNNVanilla_FP32 | PASSED (4 tests) | No leaks |
| 34 | CPU_FusionCreateOpConvForward_FP32 | PASSED (1 test) | No leaks |
| 35 | CPU_GraphApiOperationReduction_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 36 | Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 37 | Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 38 | Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 39 | Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 40 | UnitVAB/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 41 | Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) | (see #2) |
(see #2) |
| 42 | CPU_GraphApiOperationGraphDescriptor_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 43 | UnitVA/CPU_GraphApiVariantPack_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 44 | UnitVAU/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 45 | CPU_GraphApiOperationReshape_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 46 | Smoke/CPU_DeterministicConvApi_NONE | PASSED (1 test) | No leaks
|
| 47 | Smoke/GPU_FusionAux_FP32 | PASSED (1 test) | No leaks |
| 48 | CPU_GraphApiEngineHeur_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 49 | Unit/CPU_GraphApiReduction_NONE | REMOVED (PR #5603, 2026-03-26)
| n/a (deleted) |
| 50 | CPU_GraphApiEngineCfg_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 51 | Unit/CPU_GraphApiMatMul_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 52 | CPU_BackendApi_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 53 | UnitIV/CPU_GraphApiOperationPointwiseOneInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 54 | Unit3IV/CPU_GraphApiOperationPointwiseThreeInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 55 | UnitVA/CPU_GraphApiOperationMatmul_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 56 | UnitVA/CPU_GraphApiOperationRng_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |


| Outcome                            | Count |
|------------------------------------|------:|
| Passed, no leaks                   |    39 |
| Passed, leaks detected             |     0 |
| Failed                             |     0 |
| Crashed / timed out                |     0 |
| Removed — GraphAPI purge (#5603)   |    15 |
| Removed — FP64 LSTM purge (#5750)  |     1 |
| Duplicate (not re-run)             |     1 |
| **Total rows**                     |  **56** |

## Risk Assessment
Low

---------

Co-authored-by: JonathanLichtnerAMD <195780826+JonathanLichtnerAMD@users.noreply.github.com>
aledudek pushed a commit that referenced this pull request May 20, 2026
…7008)

## Motivation

Several memory leaks were detected in MIOpen gtests using ASan. Some of
the tests were blacklisted and others were not. This change looks to fix
all of the low hanging fruit, which are the majority of the leaks found.
This includes all of the critical leaks (>100MB) that were reported.
Some other leaks were identified as needing a larger refactor to
resolve.

After fixing the w_supertensor.cpp leak, the supertensor tests hit
virtual memory area limit errors. These changes are to enable running
them with ASan, but only with a subset of the tests in order to not hit
limits. The test coverage being lost is fairly negligible and the full
tests are still run without ASan enabled.

## Technical Details

Here is a summary of the files looked at and the changes made:

| File | Status | Notes |
|------|--------|-------|
| `test/gtest/gpu_mha_forward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA forward
solution descriptor leaks (report #11, #13). |
| `test/gtest/gpu_mha_backward.cpp` | Fixed | Added
`miopenDestroySolution()` loop after using solutions. Fixes MHA backward
solution descriptor leaks (report #9, #12). |
| `test/gtest/mha_find20.cpp` | Fixed | Added `miopenDestroySolution()`
loop in both `MhaForward` and `MhaBackward` tests. Fixes MHA Find2.0
solution leak (report #16). |
| `test/gtest/gtest_desc_guard.hpp` | Fixed | New shared header
introduced by the refactor. Provides a single `DescGuard<DescType,
CreateFn, DestroyFn>` template (with `TensorDescGuard`, `ConvDescGuard`,
`DropoutDescGuard`, `RNNDescGuard` aliases), a `HandleGuard` RAII
wrapper for `miopenHandle_t`, and the
`DestroyInternalRnnDropoutDesc(rnnDesc)` helper used by every
RNN/LSTM/GRU test to free the internal `DropoutDescriptor` that
`miopenCreateRNNDescriptor` allocates and `miopenSetRNNDescriptor*` then
leaks. Replaces the per-file ad-hoc guard structs from the initial
implementation. |
| `test/gtest/w_supertensor.cpp` | Fixed | Switched raw descriptors to
the shared `RNNDescGuard` / `TensorDescGuard` from
`gtest_desc_guard.hpp`. Added a class-local `DestroyDropoutDesc()`
(called from `TearDown` and before `miopenSetRNNDescriptor`) to prevent
the `miopenSetRNNDescriptor` overwrite leak. Reduced test parameter
space under ASan to avoid OOM. Removed unused `seqLen` parameter and
dead `param_dev_out`/`bias_dev_out` allocations. |
| `test/gtest/lstm.hpp` | Fixed | Switched `rnnDesc` → `RNNDescGuard`,
`DropoutDesc` → `DropoutDescGuard`, and `mio_handle` → `HandleGuard`
(which now owns the `miopenDestroy` call), all from the shared
`gtest_desc_guard.hpp`. Hoisted `dropout_state_buf` so it can be
`hipFree`d at the end of the dropout path. Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (in the non-dropout path only) at end of
`Run`, which frees the internal `DropoutDescriptor` that the Set call
would otherwise leak. Fixes ~615 MB dropout leaks and ~16.5 KB
non-dropout descriptor leaks (report #3, #4, #6, #17-#22). |
| `test/gtest/gru_test.cpp` | Fixed | In the GRU test class: switched
`rnnDesc` → `RNNDescGuard`, `DropoutDesc` → `DropoutDescGuard`,
`mio_handle` → `HandleGuard`, and added
`DestroyInternalRnnDropoutDesc(rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of `Run`
plus `hipFree(dropout_state_buf)` for the dropout path. In the in-file
`GRUFwdCPUVerify` / `GRUBwdDataCPUVerify` helpers: converted the raw
`dropout_inputTensor` / `dropout_outputTensor` declarations to
`TensorDescGuard` (mirroring the `cpu_rnn.hpp` change for the LSTM/RNN
helpers). |
| `test/gtest/softmax_find20.cpp` | Fixed | Changed `Finalize()` to take
the `std::vector<miopenSolution_t>&` and destroy each solution via
`miopenDestroySolution()` before destroying the problem. Updated all 6
`TEST(...)` callers to pass the solutions vector. Fixes Find2.0 softmax
solution/kernel leaks (report #25-#27). |
| `test/gtest/rnn_seq_api.hpp` | Fixed | Hoisted `dropout_state_buf` so
the dropout path can `hipFree` it at the end. Added
`DestroyInternalRnnDropoutDesc(&rnnDesc)` before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run
to free the internal `DropoutDescriptor` allocations leaked by
`RNNDescriptor` copy-assignment. Same pattern as the LSTM/GRU fixes
(report #29-#30). |
| `test/cpu_rnn.hpp` | Fixed | Converted 6 raw `miopenTensorDescriptor_t
dropout_input/outputTensor` declarations across the LSTM/RNN CPU
verification helpers (`LSTMFwdCPUVerify`, `LSTMBwdDataCPUVerify`,
`RNNFwdTrainCPUVerify`, `RNNBwdDataCPUVerify`, `GRUFwdCPUVerify`,
`GRUBwdDataCPUVerify`) to the shared `TensorDescGuard` from
`gtest_desc_guard.hpp`. Removed the redundant
`miopenCreateTensorDescriptor` calls and updated 12 `miopen::deref(...)`
sites to `.get()`. (Note: the GRU helpers in this header are stale
duplicates; the live ones are inside `test/gtest/gru_test.cpp` and were
updated there too.) Fixes the LSTM/GRU CPU-verify tensor descriptor
leaks (report #1, #2, #14, #15, #32, #33). |
| `test/gtest/rnn_vanilla_common.hpp` | Fixed | Added
`DestroyInternalRnnDropoutDesc(rnnDesc)` calls before
`miopenSetRNNDescriptor*` and (non-dropout path only) at end of the run.
The `RNNDescGuard` / `DropoutDescGuard` usage was already in place from
an earlier commit and now resolves to the shared definitions in
`gtest_desc_guard.hpp`. Same pattern as LSTM/GRU/rnn_seq_api fixes
(report #14, #15, #32, #33). |
| `test/gtest/graphapi_gtest_common.hpp` | Skipped | File no longer
exists in the codebase. The GraphApi test infrastructure has been
removed; only the leak report (a stale snapshot) still references it. No
fix possible against the current source tree. |
| `test/gtest/graphapi_execution_plan.cpp` | Skipped | File no longer
exists in the codebase (GraphApi removed). The leak it represented was
largely an external hipblaslt bug anyway; the remaining test-side
portion is not fixable against the current source. |
| `test/gtest/na_train.cpp` / `na_inference.cpp` / `na_*_find2.cpp` |
Skipped | Leaks come from the internal MIOpen solver/kernel cache living
on the global singleton handle, which is never destroyed. Not easily
fixable without redesigning global handle lifecycle. Should be
suppressed in the ASan suppression file. |
| hipblaslt / rocblaslt (external) | Skipped |
`SolutionCache::addKernel` and `preloadCustomKernels` leak via
`_rocblaslt_handle` constructor. Called from
`miopen::Handle::CreateHipblasLtHandle`. This is an upstream bug in
hipblaslt/rocblaslt, not fixable in MIOpen. Affects suites that
initialize a handle (report Category 2, Category 7). |
| CLR / HIP runtime (external) | Skipped | `amd::Context` and
`amd::roc::Device` global initialization leaks from
`rocclr/platform/context.cpp`. HIP runtime internals, not fixable in
MIOpen. |
| `src/hipoc/hipoc_program.cpp` | Skipped | `HIPOCProgramImpl` objects
leak during kernel compilation/caching (line ~178). This is an internal
MIOpen kernel cache lifecycle issue that requires deeper architectural
changes to fix. Contributes small amounts to MHA and Softmax Find2.0
leaks. |
| `test/gtest/conv_api.cpp` | Fixed | Already clean against current
source — `miopenDestroyConvolutionDescriptor(conv_desc)` call exists
(line 24) inside the test loop. ASAN run reports no leaks. The leak
report was based on a stale snapshot. The hipblaslt handle init portion
is tracked under the external-skipped row. |
| `test/gtest/log_test.cpp` (CPU_LOG_TEST_FUSION / CPU_LOG_TEST_NEG) |
Fixed | Already clean against current source — `Tensor`, `Conv`,
`CreateCBAFusionPlan`, `CreateBNormFusionPlan` all have proper
destructors that call the corresponding `miopenDestroy*` APIs in
`log.cpp`. ASAN run on `CPU_LOG_TEST_*` (11 tests across log_test.cpp +
log_test_neg.cpp) reports no leaks. The hipblaslt handle init portion is
tracked separately under the external-skipped row. |
| `test/gtest/fusion_test.cpp` (CPU_FusionCreateOpConvForward) | Fixed |
File renamed from `fusion.cpp` to `fusion_test.cpp`. Already clean
against current source — uses `TensorDescGuard`/`ConvDescGuard` for
tensor/conv descriptors and calls
`miopenDestroyFusionPlan(fusionPlanDesc)` on the fusion plan (line 195).
ASAN run on `CPU_FusionCreateOpConvForward_FP32.*` reports no leaks. |
| `test/gtest/deterministic_conv_api.cpp` | Fixed | Already clean
against current source — uses `ConvDescGuard` (line 66) for the conv
descriptor. ASAN run on `*CPU_DeterministicConvApi*` reports no leaks. |
| `test/gtest/fusion_aux.cpp` (GPU_FusionAux) | Fixed | Already clean
against current source — uses `ConvDescGuard` plus stack-allocated
internal C++ objects (`miopen::TensorDescriptor`,
`miopen::FusionPlanDescriptor`) which have proper destructors. The
`convoOp` handle is owned by the fusion plan. ASAN run on
`*GPU_FusionAux*` reports no leaks. |
| `test/gtest/backend_api.cpp` (CPU_BackendApi) | Skipped | File no
longer exists in the codebase. The backend API test infrastructure (part
of the removed GraphApi suite) was removed; no fix possible against the
current source tree. |


### High-level notes

New shared infrastructure (test/gtest/gtest_desc_guard.hpp)
- DescGuard<DescType, CreateFn, DestroyFn> — a single RAII template
parameterized on the descriptor type and its create/destroy entry
points. Aliases provide TensorDescGuard, ConvDescGuard,
DropoutDescGuard, and RNNDescGuard, replacing the four near-identical
guard structs
  that were copy-pasted across test files in the initial implementation.
- HandleGuard — separate RAII wrapper for miopenHandle_t (couldn't reuse
the template because miopenCreateWithStream takes an extra hipStream_t
argument). Supports lazy create(stream) so callers that only need a
handle in the dropout branch can default-construct one and
  populate it conditionally.
- DestroyInternalRnnDropoutDesc(rnnDesc) — frees the internal
DropoutDescriptor that miopenCreateRNNDescriptor allocates and that
miopenSetRNNDescriptor* then orphans. Replaces the equivalent inline
blocks that LSTM/GRU/RNN tests were each carrying. The header documents
the
two call-sites: before each Set* (always safe) and at end-of-run only on
the non-dropout path (the dropout path aliases the user-owned
descriptor, so freeing would double-free).

Recurring patterns enabled by the refactor
- The "leak from Set* overwriting the default-constructed internal
dropout descriptor" fix collapsed from per-file code to a one-line
helper call, applied uniformly across lstm.hpp, gru_test.cpp,
rnn_seq_api.hpp, and rnn_vanilla_common.hpp.
- mio_handle ownership in LSTM/GRU is now expressed via HandleGuard
rather than a manual miopenDestroy at the end of the dropout branch —
eliminates a class of forgotten-cleanup bugs.
- dropout_state_buf is consistently hoisted out of the dropout if block
so an end-of-run hipFree can release it; deletion of the buffer pairs
visibly with its allocation.

Notable non-RNN change
- softmax_find20.cpp was the only Find2.0 leak fix in this commit:
Finalize() now takes the solutions vector and calls
miopenDestroySolution() for each before destroying the problem. Same
shape applied to all 6 tests in the file.


## Test Plan

Run the tests beforehand to observe the ASan leak errors and then again
afterward to verify the fixes have resolved the problem.

## Test Result

List from ROCM-21512:

| # | Test Name | Status | Leak Status |

|---:|------------------------------------------------------|------------------------------------------------|-----------------|
| 1 | Smoke/GPU_RNNVanillaDropout_FP32 | PASSED (4 tests) | No leaks |
| 2 | Smoke/GPU_RNNVanillaDropout_FP16 | PASSED (4 tests) | No leaks |
| 3 | Full/GPU_LSTM_dropout_FP32 | PASSED (4 tests) | No leaks |
| 4 | Full/GPU_LSTM_dropout_FP16 | PASSED (4 tests) | No leaks |
| 5 | CPU_GraphApiExecutionPlanBuilder_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 6 | Full/GPU_LSTM_dropout_FP64 | REMOVED (PR #5750, 2026-03-26) | n/a
(deleted) |
| 7 | Unit/CPU_GraphApiPointwise_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 8 | Full/GPU_LstmMSRnn_FP32 | PASSED (1152 tests) | No leaks |
| 9 | Smoke/GPU_Bwd_Mha_FP32 | PASSED (12 tests) | No leaks |
| 10 | Full/GPU_LstmMSRnn_FP16 | PASSED (864 tests) | No leaks |
| 11 | Smoke/GPU_Fwd_Mha_FP32 | PASSED (15 tests) | No leaks |
| 12 | Full/GPU_Bwd_Mha_FP32 | PASSED (6 tests) | No leaks |
| 13 | Full/GPU_Fwd_Mha_FP32 | PASSED (7 tests) | No leaks |
| 14 | Full/GPU_RNNVanilla_FP32 | PASSED (96 tests) | No leaks |
| 15 | Full/GPU_RNNVanilla_FP16 | PASSED (96 tests) | No leaks |
| 16 | GPU_TestMhaFind20_FP32 | PASSED (2 tests) | No leaks |
| 17 | Full/GPU_LSTM_FP32 | PASSED (32 tests) | No leaks |
| 18 | Full/GPU_LSTM_FP16 | PASSED (32 tests) | No leaks |
| 19 | Full/GPU_LSTM_extra_FP32 | PASSED (30 tests) | No leaks |
| 20 | Full/GPU_LSTM_extra_FP16 | PASSED (30 tests) | No leaks |
| 21 | Full/GPU_DeepBench_LSTM_FP16 | PASSED (22 tests) | No leaks |
| 22 | Full/GPU_DeepBench_LSTM_FP32 | PASSED (22 tests) | No leaks |
| 23 | CPU_LOG_TEST_FUSION_NONE | PASSED (2 tests) | No leaks |
| 24 | CPU_LOG_TEST_NEG_NONE | PASSED (4 tests) | No leaks |
| 25 | GPU_SoftmaxFind20_BFP16 | PASSED (2 tests) | No leaks |
| 26 | GPU_SoftmaxFind20_FP16 | PASSED (2 tests) | No leaks |
| 27 | GPU_SoftmaxFind20_FP32 | PASSED (2 tests) | No leaks |
| 28 | CPU_ConvApi_NONE | PASSED (1 test) | No leaks |
| 29 | Full/GPU_RNNSeqApi_FP16 | PASSED (16 tests) | No leaks |
| 30 | Full/GPU_RNNSeqApi_FP32 | PASSED (16 tests) | No leaks |
| 31 | UnitVAN/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 32 | Smoke/GPU_RNNVanilla_FP16 | PASSED (4 tests) | No leaks |
| 33 | Smoke/GPU_RNNVanilla_FP32 | PASSED (4 tests) | No leaks |
| 34 | CPU_FusionCreateOpConvForward_FP32 | PASSED (1 test) | No leaks |
| 35 | CPU_GraphApiOperationReduction_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 36 | Unit2IV1/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 37 | Unit2IV1/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 38 | Unit2IV2/CPU_GraphApiOperationPointwiseBwd_NONE | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 39 | Unit2IV2/CPU_GraphApiOperationPointwiseTwoInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 40 | UnitVAB/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 41 | Smoke/GPU_RNNVanillaDropout_FP16 (duplicate of #2) | (see #2) |
(see #2) |
| 42 | CPU_GraphApiOperationGraphDescriptor_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 43 | UnitVA/CPU_GraphApiVariantPack_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 44 | UnitVAU/CPU_GraphApiRng_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 45 | CPU_GraphApiOperationReshape_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 46 | Smoke/CPU_DeterministicConvApi_NONE | PASSED (1 test) | No leaks
|
| 47 | Smoke/GPU_FusionAux_FP32 | PASSED (1 test) | No leaks |
| 48 | CPU_GraphApiEngineHeur_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 49 | Unit/CPU_GraphApiReduction_NONE | REMOVED (PR #5603, 2026-03-26)
| n/a (deleted) |
| 50 | CPU_GraphApiEngineCfg_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 51 | Unit/CPU_GraphApiMatMul_NONE | REMOVED (PR #5603, 2026-03-26) |
n/a (deleted) |
| 52 | CPU_BackendApi_NONE | REMOVED (PR #5603, 2026-03-26) | n/a
(deleted) |
| 53 | UnitIV/CPU_GraphApiOperationPointwiseOneInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 54 | Unit3IV/CPU_GraphApiOperationPointwiseThreeInput | REMOVED (PR
#5603, 2026-03-26) | n/a (deleted) |
| 55 | UnitVA/CPU_GraphApiOperationMatmul_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |
| 56 | UnitVA/CPU_GraphApiOperationRng_NONE | REMOVED (PR #5603,
2026-03-26) | n/a (deleted) |


| Outcome                            | Count |
|------------------------------------|------:|
| Passed, no leaks                   |    39 |
| Passed, leaks detected             |     0 |
| Failed                             |     0 |
| Crashed / timed out                |     0 |
| Removed — GraphAPI purge (#5603)   |    15 |
| Removed — FP64 LSTM purge (#5750)  |     1 |
| Duplicate (not re-run)             |     1 |
| **Total rows**                     |  **56** |

## Risk Assessment
Low

---------

Co-authored-by: JonathanLichtnerAMD <195780826+JonathanLichtnerAMD@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants