[rocPRIM] Reset internal hip error for tests that run out of memory#75
Merged
Conversation
777b44a to
280f15c
Compare
The behaviour of hipGetLastError is changing in HIP 7.0. Previously the error that was reported was cleared on each HIP API call. This means that hipGetLastError reported any error that occurred during the last HIP API call. Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError. Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. This sets the internal HIP error, and it's not cleared before subsequent tests call hipGetLastError, causing them to fail. This change adds extra calls to hipGetLastError to clear the error (for future tests) in cases where tests run out of memory.
280f15c to
869f2e3
Compare
Contributor
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
stanleytsang-amd
approved these changes
May 28, 2025
assistant-librarian Bot
pushed a commit
to ROCm/rocPRIM
that referenced
this pull request
Jun 2, 2025
[rocPRIM] Reset internal hip error for tests that run out of memory (#75) The behaviour of hipGetLastError is changing in HIP 7.0. Previously the error that was reported was cleared on each HIP API call. This means that hipGetLastError reported any error that occurred during the last HIP API call. Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError. Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. This sets the internal HIP error, and it's not cleared before subsequent tests call hipGetLastError, causing them to fail. This change adds extra calls to hipGetLastError to clear the error (for future tests) in cases where tests run out of memory.
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
…75) * Initial test setup and implementation of first instance of sending the graph to the backend library from the frontend. * add comment for weak ptr * implement almost everything up until execute. todo, proper validate, setting up variant pack * implement happy path tests for the graph setup functions * Add packing of variant pack and calling execute. * fix most code review concerns * fix the backend_execute_api test * Add another test logging initializer that uses default spdlog functionality rather than the callback system. Callback system seems to print all logs at the end of the test rather than as they happened. * fix bad error logging * add test for formatters * add converter to go from frontend to backend heur mode
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
…75) * Initial test setup and implementation of first instance of sending the graph to the backend library from the frontend. * add comment for weak ptr * implement almost everything up until execute. todo, proper validate, setting up variant pack * implement happy path tests for the graph setup functions * Add packing of variant pack and calling execute. * fix most code review concerns * fix the backend_execute_api test * Add another test logging initializer that uses default spdlog functionality rather than the callback system. Callback system seems to print all logs at the end of the test rather than as they happened. * fix bad error logging * add test for formatters * add converter to go from frontend to backend heur mode [ROCm/hipDNN commit: 494969d]
evetsso
pushed a commit
to evetsso/rocm-libraries
that referenced
this pull request
Dec 31, 2025
* [gfx1250] Fix example issues for gfx125x 1. Refine KGroup in GridwiseMoeGemm, dequant pipeline doesn't support KGroup for now 2. Enable example example_moe_gemm1_xdl_pk_i4, example_moe_gemm2_xdl_pk_i4 and example_grouped_gemm_lower_triangle_scale_softmax_gemm_permute_xdl_fp16 for gfx125x * [gfx1250] Workaround hipOccupancyMaxActiveBlocksPerMultiprocessor return value hipOccupancyMaxActiveBlocksPerMultiprocessor return 0 on gfx125x, and it causes all streamk example crash workaround: set the min value to 1. --------- Co-authored-by: Qun Lin <qlin@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The behaviour of hipGetLastError is changing in HIP 7.0. Previously the error that was reported was cleared on each HIP API call. This means that hipGetLastError reported any error that occurred during the last HIP API call.
Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError.
Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. This sets the internal HIP error, and it's not cleared before subsequent tests call hipGetLastError, causing them to fail.
This change adds extra calls to hipGetLastError to clear the error (for future tests) in cases where tests run out of memory.