Add hipGetLastError calls to clear existing errors#47
Closed
assistant-librarian[bot] wants to merge 6 commits into
Closed
Add hipGetLastError calls to clear existing errors#47assistant-librarian[bot] wants to merge 6 commits into
assistant-librarian[bot] wants to merge 6 commits into
Conversation
Future HIP versions will change the behaviour of hipGetLastError slightly. Currently, the function returns any error that occured in the last HIP API call in the current host thread. In other words, the error it reports is cleared with each HIP API call. In the future, the function will return any error that occurred in any HIP API call in the current host thread, since the last time that hipGetLastError was called. In other words, the error it reports will be cleared only on a call to hipGetLastError. A number of rocPRIM tests and benchmarks currently rely on the old behaviour of hipGetLastError. In order to make sure that they continue to work with the future changes, we need to call hipGetLastError before the test/benchmark code is run, so that any previous errors that may have occurred (eg. a call to hipMalloc that failed due to insufficient memory - which happens on some architectures for large test input sizes) get cleared before the test/benchmark calls hipGetLastError. This change: - modifies the HIP_CHECK macro so that it clears hipGetLastError before and after the HIP API call it wraps. It now checks for two types of errors: error returned from the wrapped function call, and errors reported by hipGetLastError after the wrapped call completes. - adds a HIP_CHECK_LAUNCH macro that can be used to wrap kernel calls. It clears any internally recorded HIP error before and after the kernel is invoked. Tests will fail if the hipGetLastError call invoked after the kernel returns an error code. - modifies the HIP_CHECK_MEMORY macro to clear an existing error before the memory allocation call it wraps. If the memory allocation call returns hipErrorOutOfMemory, then hipGetLastError is called again (afterwards) to clear the error. - modifies a few test files so that they use hipLaunchKernelGGL instead of the triple chevron syntax for launching kernels. The triple chevron syntax cannot be wrapped in a call to the HIP_LAUNCH_KERNEL marco.
The behaviour of hipGetLastError will change in the future. With the changes, the error it records will only be cleared on each call to hipGetLastError. Call hipGetLastError at the beginning of public device API functions, since they may call hipGetLastError internally, and we don't want that call to report an error that happened before the function was invoked.
Modify the HIP_CHECK and HIP_CHECK_LAUNCH macros so they more clearly capture returned, pre-launch and post-launch errors. These changes are based on the HIP documentation and example at https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_runtime_api/error_handling.html. For HIP_CHECK, clear any pre-existing error, then capture any error returned by the expression being checked, and any HIP error returned by hipGetLastError. For HIP_CHECK_LAUNCH, clear any pre-existing error, then launch the kernel. Then capture any error returned by hipGetLastError (this will capture pre-launch issues like kernel argument problems), and capture any error returned by hipStreamSynchronize (this will capture in-kernel errors).
Some tests cannot synchronize immediately after a kernel call/ This change splits HIP_CHECK_LAUNCH into two macros: - HIP_CHECK_LAUNCH - does not call hipDeviceSynchronize (leaving detection of in-kernel errors up to the caller) - HIP_CHECK_LAUNCH_SYNC - does call hipDeviceSynchronize (catches in-kernel errors) It also adds a few hipGetLastError calls to clear the internally tracked HIP error for new algorithms that have been added.
Remove duplicate calls to clear hipGetLastError.
…et_last_error_update
Naraenda
approved these changes
May 22, 2025
Member
Naraenda
left a comment
There was a problem hiding this comment.
Nothing to add, see review on pre-migrated PR.
Contributor
|
Closing this PR in favour of #1227. |
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
* Initial swap to custom exceptions throughout * Remove converts and dynamic_casts * Swap to bad param * swap back to classes * clean up
ammallya
pushed a commit
that referenced
this pull request
Sep 24, 2025
* Initial swap to custom exceptions throughout * Remove converts and dynamic_casts * Swap to bad param * swap back to classes * clean up [ROCm/hipDNN commit: 9bdda2b]
stanleytsang-amd
pushed a commit
that referenced
this pull request
Dec 12, 2025
## Motivation
Enable gfx1152 and gfx1153.
## Technical Details
1. combine arrays into tables and use local macros to reduce repetition
(for maintainability)
2. monkey-see-monkey-do wherever `gfx11...` was found
## Test Plan
Build existing ctests for, and run them on, gfx1152 and gfx1153.
## Test Result
###
<details>
<summary>gfx1152 passed (click to see log)</summary>
```
INFO:root:++ Exec [/tmp/eble]$ ctest --test-dir /tmp/eble/rocm/bin/rocprim --output-o
n-failure --parallel 2 --exclude-regex 'rocprim.lookback_reproducibility|rocprim.link
ing|rocprim.device_merge_inplace|rocprim.device_merge_sort|rocprim.device_partition|r
ocprim.device_radix_sort|rocprim.device_scan|rocprim.device_select|rocprim.device_fin
d_first_of|rocprim.device_reduce_by_key' --timeout 60
Test project /tmp/eble/rocm/bin/rocprim
Start 1: hip.device_api
Start 2: hip.async_copy
1/73 Test #2: hip.async_copy .............................. Passed 0.01 sec
Start 3: hip.ordered_block_id
2/73 Test #1: hip.device_api .............................. Passed 0.02 sec
Start 4: rocprim.internal_merge_path
3/73 Test #4: rocprim.internal_merge_path ................. Passed 0.01 sec
Start 5: rocprim.basic_test
4/73 Test #3: hip.ordered_block_id ........................ Passed 0.01 sec
Start 6: rocprim.arg_index_iterator
5/73 Test #5: rocprim.basic_test .......................... Passed 0.01 sec
Start 7: rocprim.temporary_storage_partitioning
6/73 Test #6: rocprim.arg_index_iterator .................. Passed 0.01 sec
Start 8: rocprim.block_adjacent_difference
7/73 Test #7: rocprim.temporary_storage_partitioning ...... Passed 0.01 sec
Start 9: rocprim.block_discontinuity
8/73 Test #8: rocprim.block_adjacent_difference ........... Passed 2.34 sec
Start 10: rocprim.bit_cast
9/73 Test #10: rocprim.bit_cast ............................ Passed 0.02 sec
Start 11: rocprim.block_exchange
10/73 Test #11: rocprim.block_exchange ...................... Passed 0.73 sec
Start 12: rocprim.block_histogram
11/73 Test #12: rocprim.block_histogram ..................... Passed 0.54 sec
Start 13: rocprim.block_load_store
12/73 Test #13: rocprim.block_load_store .................... Passed 0.44 sec
Start 14: rocprim.block_sort_merge
13/73 Test #14: rocprim.block_sort_merge .................... Passed 0.02 sec
Start 15: rocprim.block_sort_merge_stable
14/73 Test #15: rocprim.block_sort_merge_stable ............. Passed 0.02 sec
Start 16: rocprim.block_radix_rank
15/73 Test #16: rocprim.block_radix_rank .................... Passed 0.03 sec
Start 17: rocprim.block_radix_sort
16/73 Test #17: rocprim.block_radix_sort .................... Passed 4.79 sec
Start 18: rocprim.block_reduce
17/73 Test #18: rocprim.block_reduce ........................ Passed 0.26 sec
Start 19: rocprim.block_run_length_decode
18/73 Test #19: rocprim.block_run_length_decode ............. Passed 0.54 sec
Start 20: rocprim.block_scan
19/73 Test #20: rocprim.block_scan .......................... Passed 0.04 sec
Start 21: rocprim.block_shuffle
20/73 Test #21: rocprim.block_shuffle ....................... Passed 2.70 sec
Start 22: rocprim.block_sort_bitonic
21/73 Test #9: rocprim.block_discontinuity ................. Passed 17.36 sec
Start 23: rocprim.config_dispatch
22/73 Test #23: rocprim.config_dispatch ..................... Passed 0.09 sec
Start 24: rocprim.constant_iterator
23/73 Test #24: rocprim.constant_iterator ................... Passed 0.07 sec
Start 25: rocprim.counting_iterator
24/73 Test #25: rocprim.counting_iterator ................... Passed 0.07 sec
Start 26: rocprim.device_batch_memcpy
25/73 Test #26: rocprim.device_batch_memcpy ................. Passed 1.20 sec
Start 27: rocprim.device_binary_search
26/73 Test #27: rocprim.device_binary_search ................ Passed 0.02 sec
Start 28: rocprim.device_adjacent_difference
27/73 Test #28: rocprim.device_adjacent_difference .......... Passed 0.01 sec
Start 29: rocprim.device_adjacent_find
28/73 Test #29: rocprim.device_adjacent_find ................ Passed 0.01 sec
Start 30: rocprim.device_find_end
29/73 Test #30: rocprim.device_find_end ..................... Passed 0.01 sec
Start 31: rocprim.device_histogram
30/73 Test #22: rocprim.block_sort_bitonic .................. Passed 13.23 sec
Start 32: rocprim.device_merge
31/73 Test #31: rocprim.device_histogram .................... Passed 7.85 sec
Start 33: rocprim.nth_element
32/73 Test #33: rocprim.nth_element ......................... Passed 0.03 sec
Start 34: rocprim.device_partial_sort
33/73 Test #34: rocprim.device_partial_sort ................. Passed 0.02 sec
Start 35: rocprim.device_reduce
34/73 Test #35: rocprim.device_reduce ....................... Passed 8.94 sec
Start 36: rocprim.device_run_length_encode
35/73 Test #32: rocprim.device_merge ........................ Passed 14.05 sec
Start 37: rocprim.device_search
36/73 Test #37: rocprim.device_search ....................... Passed 0.02 sec
Start 38: rocprim.device_segmented_radix_sort
37/73 Test #36: rocprim.device_run_length_encode ............ Passed 13.92 sec
Start 39: rocprim.device_search_n
38/73 Test #39: rocprim.device_search_n ..................... Passed 0.02 sec
Start 40: rocprim.device_segmented_reduce
39/73 Test #40: rocprim.device_segmented_reduce ............. Passed 5.21 sec
Start 41: rocprim.device_segmented_scan
40/73 Test #41: rocprim.device_segmented_scan ............... Passed 0.02 sec
Start 42: rocprim.device_transform
41/73 Test #42: rocprim.device_transform .................... Passed 13.90 sec
Start 43: rocprim.discard_iterator
42/73 Test #43: rocprim.discard_iterator .................... Passed 0.07 sec
Start 44: rocprim.radix_key_codec
43/73 Test #44: rocprim.radix_key_codec ..................... Pas09:54:11 [55/1943]
Start 45: rocprim.predicate_iterator
44/73 Test #45: rocprim.predicate_iterator .................. Passed 0.07 sec
Start 46: rocprim.reverse_iterator
45/73 Test #46: rocprim.reverse_iterator .................... Passed 0.09 sec
Start 47: rocprim.rocprim_tuple
46/73 Test #47: rocprim.rocprim_tuple ....................... Passed 0.01 sec
Start 48: rocprim.rocprim_types
47/73 Test #48: rocprim.rocprim_types ....................... Passed 0.01 sec
Start 49: rocprim.texture_cache_iterator
48/73 Test #49: rocprim.texture_cache_iterator .............. Passed 0.01 sec
Start 50: rocprim.thread
49/73 Test #50: rocprim.thread .............................. Passed 0.07 sec
Start 51: rocprim.thread_algos
50/73 Test #51: rocprim.thread_algos ........................ Passed 0.35 sec
Start 52: rocprim.tuple
51/73 Test #52: rocprim.tuple ............................... Passed 0.02 sec
Start 53: rocprim.utils_sort_checker
52/73 Test #53: rocprim.utils_sort_checker .................. Passed 0.01 sec
Start 54: rocprim.transform_iterator
53/73 Test #54: rocprim.transform_iterator .................. Passed 0.11 sec
Start 55: rocprim.type_traits_interface_cpp17
54/73 Test #55: rocprim.type_traits_interface_cpp17 ......... Passed 0.01 sec
Start 56: rocprim.type_traits_interface_gnupp17
55/73 Test #56: rocprim.type_traits_interface_gnupp17 ....... Passed 0.01 sec
Start 57: rocprim.type_traits_interface_cpp20
56/73 Test #57: rocprim.type_traits_interface_cpp20 ......... Passed 0.01 sec
Start 58: rocprim.type_traits_interface_gnupp20
57/73 Test #58: rocprim.type_traits_interface_gnupp20 ....... Passed 0.01 sec
Start 59: rocprim.no_half_operators
58/73 Test #59: rocprim.no_half_operators ................... Passed 0.01 sec
Start 60: rocprim.intrinsics
59/73 Test #60: rocprim.intrinsics .......................... Passed 0.21 sec
Start 61: rocprim.intrinsics_atomic
60/73 Test #61: rocprim.intrinsics_atomic ................... Passed 0.02 sec
Start 62: rocprim.invoke_result
61/73 Test #62: rocprim.invoke_result ....................... Passed 0.01 sec
Start 63: rocprim.warp_exchange
62/73 Test #63: rocprim.warp_exchange ....................... Passed 0.08 sec
Start 64: rocprim.warp_load
63/73 Test #64: rocprim.warp_load ........................... Passed 0.08 sec
Start 65: rocprim.warp_reduce
64/73 Test #65: rocprim.warp_reduce ......................... Passed 0.14 sec
Start 66: rocprim.warp_scan
65/73 Test #66: rocprim.warp_scan ........................... Passed 0.20 sec
Start 67: rocprim.warp_scan_disable_dpp_disable_dpp
66/73 Test #67: rocprim.warp_scan_disable_dpp_disable_dpp ... Passed 0.21 sec
Start 68: rocprim.warp_sort
67/73 Test #68: rocprim.warp_sort ........................... Passed 0.09 sec
Start 69: rocprim.warp_store
68/73 Test #69: rocprim.warp_store .......................... Passed 0.02 sec
Start 70: rocprim.zip_iterator
69/73 Test #70: rocprim.zip_iterator ........................ Passed 0.02 sec
Start 71: rocprim.accumulator_t
70/73 Test #71: rocprim.accumulator_t ....................... Passed 0.02 sec
Start 72: hipgraph.basic
71/73 Test #72: hipgraph.basic .............................. Passed 0.02 sec
Start 73: hipgraph.algs
72/73 Test #73: hipgraph.algs ............................... Passed 0.01 sec
73/73 Test #38: rocprim.device_segmented_radix_sort ......... Passed 31.97 sec
100% tests passed, 0 tests failed out of 73
Total Test time (real) = 71.80 sec
✅ test_rocprim.py PASSED
```
</details>
<details>
<summary>gfx1153 passed (click to see log)</summary>
```
INFO:root:++ Exec [/tmp/eble]$ ctest --test-dir /tmp/eble/rocm/bin/rocprim --output-o
n-failure --parallel 1 --exclude-regex 'rocprim.lookback_reproducibility|rocprim.link
ing|rocprim.device_merge_inplace|rocprim.device_merge_sort|rocprim.device_partition|r
ocprim.device_radix_sort|rocprim.device_scan|rocprim.device_select|rocprim.device_fin
d_first_of|rocprim.device_reduce_by_key' --timeout 60
Test project /tmp/eble/rocm/bin/rocprim
Start 1: hip.device_api
1/73 Test #1: hip.device_api .............................. Passed 0.01 sec
Start 2: hip.async_copy
2/73 Test #2: hip.async_copy .............................. Passed 0.01 sec
Start 3: hip.ordered_block_id
3/73 Test #3: hip.ordered_block_id ........................ Passed 0.01 sec
Start 4: rocprim.internal_merge_path
4/73 Test #4: rocprim.internal_merge_path ................. Passed 0.01 sec
Start 5: rocprim.basic_test
5/73 Test #5: rocprim.basic_test .......................... Passed 0.01 sec
Start 6: rocprim.arg_index_iterator
6/73 Test #6: rocprim.arg_index_iterator .................. Passed 0.01 sec
Start 7: rocprim.temporary_storage_partitioning
7/73 Test #7: rocprim.temporary_storage_partitioning ...... Passed 0.01 sec
Start 8: rocprim.block_adjacent_difference
8/73 Test #8: rocprim.block_adjacent_difference ........... Passed 2.94 sec
Start 9: rocprim.block_discontinuity
9/73 Test #9: rocprim.block_discontinuity ................. Passed 21.10 sec
Start 10: rocprim.bit_cast
10/73 Test #10: rocprim.bit_cast ............................ Passed 0.01 sec
Start 11: rocprim.block_exchange
11/73 Test #11: rocprim.block_exchange ...................... Passed 2.20 sec
Start 12: rocprim.block_histogram
12/73 Test #12: rocprim.block_histogram ..................... Passed 0.71 sec
Start 13: rocprim.block_load_store
13/73 Test #13: rocprim.block_load_store .................... Passed 0.48 sec
Start 14: rocprim.block_sort_merge
14/73 Test #14: rocprim.block_sort_merge .................... Passed 0.02 sec
Start 15: rocprim.block_sort_merge_stable
15/73 Test #15: rocprim.block_sort_merge_stable ............. Passed 0.02 sec
Start 16: rocprim.block_radix_rank
16/73 Test #16: rocprim.block_radix_rank .................... Passed 0.02 sec
Start 17: rocprim.block_radix_sort
17/73 Test #17: rocprim.block_radix_sort .................... Passed 6.12 sec
Start 18: rocprim.block_reduce
18/73 Test #18: rocprim.block_reduce ........................ Passed 0.31 sec
Start 19: rocprim.block_run_length_decode
19/73 Test #19: rocprim.block_run_length_decode ............. Passed 0.68 sec
Start 20: rocprim.block_scan
20/73 Test #20: rocprim.block_scan .......................... Passed 0.03 sec
Start 21: rocprim.block_shuffle
21/73 Test #21: rocprim.block_shuffle ....................... Passed 3.63 sec
Start 22: rocprim.block_sort_bitonic
22/73 Test #22: rocprim.block_sort_bitonic .................. Passed 19.34 sec
Start 23: rocprim.config_dispatch
23/73 Test #23: rocprim.config_dispatch ..................... Passed 0.10 sec
Start 24: rocprim.constant_iterator
24/73 Test #24: rocprim.constant_iterator ................... Passed 0.09 sec
Start 25: rocprim.counting_iterator
25/73 Test #25: rocprim.counting_iterator ................... Passed 0.09 sec
Start 26: rocprim.device_batch_memcpy
26/73 Test #26: rocprim.device_batch_memcpy ................. Passed 1.42 sec
Start 27: rocprim.device_binary_search
27/73 Test #27: rocprim.device_binary_search ................ Passed 0.01 sec
Start 28: rocprim.device_adjacent_difference
28/73 Test #28: rocprim.device_adjacent_difference .......... Passed 0.01 sec
Start 29: rocprim.device_adjacent_find
29/73 Test #29: rocprim.device_adjacent_find ................ Passed 0.01 sec
Start 30: rocprim.device_find_end
30/73 Test #30: rocprim.device_find_end ..................... Passed 0.01 sec
Start 31: rocprim.device_histogram
31/73 Test #31: rocprim.device_histogram .................... Passed 8.77 sec
Start 32: rocprim.device_merge
32/73 Test #32: rocprim.device_merge ........................ Passed 16.23 sec
Start 33: rocprim.nth_element
33/73 Test #33: rocprim.nth_element ......................... Passed 0.01 sec
Start 34: rocprim.device_partial_sort
34/73 Test #34: rocprim.device_partial_sort ................. Passed 0.02 sec
Start 35: rocprim.device_reduce
35/73 Test #35: rocprim.device_reduce ....................... Passed 10.92 sec
Start 36: rocprim.device_run_length_encode
36/73 Test #36: rocprim.device_run_length_encode ............ Passed 14.34 sec
Start 37: rocprim.device_search
37/73 Test #37: rocprim.device_search ....................... Passed 0.01 sec
Start 38: rocprim.device_segmented_radix_sort
38/73 Test #38: rocprim.device_segmented_radix_sort ......... Passed 38.28 sec
Start 39: rocprim.device_search_n
39/73 Test #39: rocprim.device_search_n ..................... Passed 0.02 sec
Start 40: rocprim.device_segmented_reduce
40/73 Test #40: rocprim.device_segmented_reduce ............. Passed 7.19 sec
Start 41: rocprim.device_segmented_scan
41/73 Test #41: rocprim.device_segmented_scan ............... Passed 0.02 sec
Start 42: rocprim.device_transform
42/73 Test #42: rocprim.device_transform .................... Passed 17.64 sec
Start 43: rocprim.discard_iterator
43/73 Test #43: rocprim.discard_iterator .................... Passed 0.12 sec
Start 44: rocprim.radix_key_codec
44/73 Test #44: rocprim.radix_key_codec ..................... Passed 0.01 sec
Start 45: rocprim.predicate_iterator
45/73 Test #45: rocprim.predicate_iterator .................. Passed 0.08 sec
Start 46: rocprim.reverse_iterator
46/73 Test #46: rocprim.reverse_iterator .................... Pas10:13:26 [46/1844]
Start 47: rocprim.rocprim_tuple
47/73 Test #47: rocprim.rocprim_tuple ....................... Passed 0.01 sec
Start 48: rocprim.rocprim_types
48/73 Test #48: rocprim.rocprim_types ....................... Passed 0.01 sec
Start 49: rocprim.texture_cache_iterator
49/73 Test #49: rocprim.texture_cache_iterator .............. Passed 0.01 sec
Start 50: rocprim.thread
50/73 Test #50: rocprim.thread .............................. Passed 0.08 sec
Start 51: rocprim.thread_algos
51/73 Test #51: rocprim.thread_algos ........................ Passed 0.43 sec
Start 52: rocprim.tuple
52/73 Test #52: rocprim.tuple ............................... Passed 0.01 sec
Start 53: rocprim.utils_sort_checker
53/73 Test #53: rocprim.utils_sort_checker .................. Passed 0.01 sec
Start 54: rocprim.transform_iterator
54/73 Test #54: rocprim.transform_iterator .................. Passed 0.12 sec
Start 55: rocprim.type_traits_interface_cpp17
55/73 Test #55: rocprim.type_traits_interface_cpp17 ......... Passed 0.01 sec
Start 56: rocprim.type_traits_interface_gnupp17
56/73 Test #56: rocprim.type_traits_interface_gnupp17 ....... Passed 0.01 sec
Start 57: rocprim.type_traits_interface_cpp20
57/73 Test #57: rocprim.type_traits_interface_cpp20 ......... Passed 0.01 sec
Start 58: rocprim.type_traits_interface_gnupp20
58/73 Test #58: rocprim.type_traits_interface_gnupp20 ....... Passed 0.01 sec
Start 59: rocprim.no_half_operators
59/73 Test #59: rocprim.no_half_operators ................... Passed 0.01 sec
Start 60: rocprim.intrinsics
60/73 Test #60: rocprim.intrinsics .......................... Passed 0.29 sec
Start 61: rocprim.intrinsics_atomic
61/73 Test #61: rocprim.intrinsics_atomic ................... Pas10:13:27 [16/1844]
Start 62: rocprim.invoke_result
62/73 Test #62: rocprim.invoke_result ....................... Passed 0.01 sec
Start 63: rocprim.warp_exchange
63/73 Test #63: rocprim.warp_exchange ....................... Passed 0.09 sec
Start 64: rocprim.warp_load
64/73 Test #64: rocprim.warp_load ........................... Passed 0.09 sec
Start 65: rocprim.warp_reduce
65/73 Test #65: rocprim.warp_reduce ......................... Passed 0.17 sec
Start 66: rocprim.warp_scan
66/73 Test #66: rocprim.warp_scan ........................... Passed 0.26 sec
Start 67: rocprim.warp_scan_disable_dpp_disable_dpp
67/73 Test #67: rocprim.warp_scan_disable_dpp_disable_dpp ... Passed 0.26 sec
Start 68: rocprim.warp_sort
68/73 Test #68: rocprim.warp_sort ........................... Passed 0.10 sec
Start 69: rocprim.warp_store
69/73 Test #69: rocprim.warp_store .......................... Passed 0.01 sec
Start 70: rocprim.zip_iterator
70/73 Test #70: rocprim.zip_iterator ........................ Passed 0.01 sec
Start 71: rocprim.accumulator_t
71/73 Test #71: rocprim.accumulator_t ....................... Passed 0.01 sec
Start 72: hipgraph.basic
72/73 Test #72: hipgraph.basic .............................. Passed 0.01 sec
Start 73: hipgraph.algs
73/73 Test #73: hipgraph.algs ............................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 73
Total Test time (real) = 175.34 sec
✅ test_rocprim.py PASSED
```
<detail>
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
KKyang
pushed a commit
that referenced
this pull request
Apr 18, 2026
* [stinkytofu] Add pseudo LABEL instruction to represent a block. * update descriptions
KKyang
pushed a commit
that referenced
this pull request
Apr 20, 2026
* [stinkytofu] Add pseudo LABEL instruction to represent a block. * update descriptions
aledudek
pushed a commit
that referenced
this pull request
May 20, 2026
* [stinkytofu] Add pseudo LABEL instruction to represent a block. * update descriptions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The behaviour of hipGetLastError will be changing in an upcoming HIP release. Currently, the error that's reported is cleared on each HIP API call. This means that hipGetLastError reports any error that occurred during the last HIP API call (in the current host thread).
Moving forward, the error that's reported will only be cleared on each call to hipGetLastError. This means that hipGetLastError will report any error that has occurred since the last call to hipGetError (in the current host thread).
Some of our tests rely on observing a return value of hipErrorOutOfMemory from hipMalloc when an allocation is too large for a given GPU architecture's memory system. While this will still work with the future behaviour, it will cause subsequent calls to hipGetError to also report this error.
This change fixes these tests by calling hipGetLastError before sections of code we want to detect errors in, so that any previously recorded error is cleared. This ensures that when we call hipGetLastError again after the code sections of interest complete, it only reports errors from within the sections of interest.
This change also adds error-clearing hipGetLastError calls to other locations (besides the tests mentioned above) where hipGetLastError is called. This is mainly to guard against user code that makes HIP API calls which set an error between rocPRIM calls.
More specifically, this change addes code to clear errors before:
🔁 Imported from ROCm/rocPRIM#686
🧑💻 Originally authored by @umfranzw