Skip to content

Reset internal hip error for tests that run out of memory#90

Merged
umfranzw merged 1 commit into
ROCm:release-staging/rocm-rel-7.0from
umfranzw:fix_out_of_mem_tests-rel-staging
May 29, 2025
Merged

Reset internal hip error for tests that run out of memory#90
umfranzw merged 1 commit into
ROCm:release-staging/rocm-rel-7.0from
umfranzw:fix_out_of_mem_tests-rel-staging

Conversation

@umfranzw
Copy link
Copy Markdown
Contributor

The behaviour of hipGetLastError is changing in HIP 7.0. Previously the error that was reported was cleared on each HIP API call. This means that hipGetLastError reported any error that occurred during the last HIP API call.

Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError.

Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. This sets the internal HIP error, and it's not cleared before subsequent tests call hipGetLastError, causing them to fail.

This change adds extra calls to hipGetLastError to clear the error (for future tests) in cases where tests run out of memory.

The behaviour of hipGetLastError is changing in HIP 7.0. Previously the error that was reported was cleared on each HIP API call. This means that hipGetLastError reported any error that occurred during the last HIP API call.

Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError.

Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. This sets the internal HIP error, and it's not cleared before subsequent tests call hipGetLastError, causing them to fail.

This change adds extra calls to hipGetLastError to clear the error (for future tests) in cases where tests run out of memory.
@umfranzw umfranzw merged commit f51f563 into ROCm:release-staging/rocm-rel-7.0 May 29, 2025
19 of 23 checks passed
assistant-librarian Bot pushed a commit to ROCm/rocPRIM that referenced this pull request Jun 10, 2025
Reset internal hip error for tests that run out of memory
 (#90)

The behaviour of hipGetLastError is changing in HIP 7.0. Previously the
error that was reported was cleared on each HIP API call. This means
that hipGetLastError reported any error that occurred during the last
HIP API call.

Moving forward, the error that's reported will only be cleared on each
call to hipGetLastError. This means that hipGetLastError will report any
error that has occurred since the last call to hipGetError.

Some of our tests rely on observing a return value of
hipErrorOutOfMemory from hipMalloc when an allocation is too large for a
given GPU architecture's memory system. This sets the internal HIP
error, and it's not cleared before subsequent tests call
hipGetLastError, causing them to fail.

This change adds extra calls to hipGetLastError to clear the error (for
future tests) in cases where tests run out of memory.
jayhawk-commits pushed a commit that referenced this pull request Jun 17, 2025
The behaviour of hipGetLastError is changing in HIP 7.0. Previously the
error that was reported was cleared on each HIP API call. This means
that hipGetLastError reported any error that occurred during the last
HIP API call.

Moving forward, the error that's reported will only be cleared on each
call to hipGetLastError. This means that hipGetLastError will report any
error that has occurred since the last call to hipGetError.

Some of our tests rely on observing a return value of
hipErrorOutOfMemory from hipMalloc when an allocation is too large for a
given GPU architecture's memory system. This sets the internal HIP
error, and it's not cleared before subsequent tests call
hipGetLastError, causing them to fail.

This change adds extra calls to hipGetLastError to clear the error (for
future tests) in cases where tests run out of memory.
jayhawk-commits added a commit that referenced this pull request Jun 18, 2025
### Includes the following PRs:
- #76 
- #77 
- #78 
- #90 
- #135 
- #150 
- #192

---------

Co-authored-by: Nick Breed <78807921+NB4444@users.noreply.github.com>
Co-authored-by: Sander Bos <sander@streamhpc.com>
Co-authored-by: Nara Prasetya <nara@streamhpc.com>
Co-authored-by: Michael Kuron <1748330+mkuron@users.noreply.github.com>
Co-authored-by: Wayne Franz <wayfranz@amd.com>
ammallya pushed a commit that referenced this pull request Sep 24, 2025
* Adding ASAN builds to github pr template

* Added more docs
ammallya pushed a commit that referenced this pull request Sep 24, 2025
* Adding ASAN builds to github pr template

* Added more docs

[ROCm/hipDNN commit: 26e15da]
evetsso pushed a commit to evetsso/rocm-libraries that referenced this pull request Dec 31, 2025
…2d_build

[ck_tile] Fix tile_example_layernorm2d_fwd build error on clang20+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants