Skip to content

Fix build issue with device_run_length_encode#31

Merged
jszuppe merged 2 commits into
ROCm:developfrom
aaronenyeshi:hip-clang-run-length-encode
Aug 28, 2018
Merged

Fix build issue with device_run_length_encode#31
jszuppe merged 2 commits into
ROCm:developfrom
aaronenyeshi:hip-clang-run-length-encode

Conversation

@aaronenyeshi
Copy link
Copy Markdown
Contributor

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.
@aaronenyeshi aaronenyeshi requested a review from ex-rzr August 24, 2018 21:04
@aaronenyeshi
Copy link
Copy Markdown
Contributor Author

aaronenyeshi commented Aug 24, 2018

This will fix the build issues with missing copy operator with __device__ attribute, and initialization is not support for __shared__ variables on CUDA.

Running RocprimDeviceRunLengthEncode still has some failures. Currently results are:

[----------] Global test environment tear-down
[==========] 26 tests from 13 test cases ran. (2628 ms total)
[  PASSED  ] 23 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>

 3 FAILED TESTS

Please refer to Issue #22 for more details.

@aaronenyeshi aaronenyeshi requested review from jszuppe and removed request for ex-rzr August 24, 2018 21:05
@aaronenyeshi
Copy link
Copy Markdown
Contributor Author

Full test output:

Running main() from /root/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 26 tests from 13 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from RocprimDeviceRunLengthEncode/0, where TypeParam = params<int, int, 1u, 1u, true>
[ RUN      ] RocprimDeviceRunLengthEncode/0.Encode
[       OK ] RocprimDeviceRunLengthEncode/0.Encode (223 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/0.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/0.NonTrivialRuns (396 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/0 (619 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/1, where TypeParam = params<double, int, 3u, 5u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/1.Encode
[       OK ] RocprimDeviceRunLengthEncode/1.Encode (150 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 893988769
  offsets_expected[i]
    Which is: 18518
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 64316
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false> (15 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/1 (165 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/2, where TypeParam = params<float, int, 1u, 10u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/2.Encode
[       OK ] RocprimDeviceRunLengthEncode/2.Encode (118 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 380994
  offsets_expected[i]
    Which is: 84640
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 2020609
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false> (146 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/2 (264 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/3, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/3.Encode
[       OK ] RocprimDeviceRunLengthEncode/3.Encode (82 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 42517
  offsets_expected[i]
    Which is: 21344
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 64316
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false> (11 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/3 (93 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/4, where TypeParam = params<test_utils::custom_test_type<int>, unsigned int, 20u, 100u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/4.Encode
[       OK ] RocprimDeviceRunLengthEncode/4.Encode (60 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/4.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/4.NonTrivialRuns (64 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/4 (124 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/5, where TypeParam = params<float, unsigned long long, 100u, 400u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/5.Encode
[       OK ] RocprimDeviceRunLengthEncode/5.Encode (62 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/5.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/5.NonTrivialRuns (62 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/5 (124 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/6, where TypeParam = params<unsigned int, unsigned int, 200u, 600u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/6.Encode
[       OK ] RocprimDeviceRunLengthEncode/6.Encode (58 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/6.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/6.NonTrivialRuns (57 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/6 (115 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/7, where TypeParam = params<double, int, 100u, 2000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/7.Encode
[       OK ] RocprimDeviceRunLengthEncode/7.Encode (59 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/7.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/7.NonTrivialRuns (60 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/7 (119 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/8, where TypeParam = params<test_utils::custom_test_type<double>, test_utils::custom_test_type<int>, 10u, 30000u, true>
[ RUN      ] RocprimDeviceRunLengthEncode/8.Encode
[       OK ] RocprimDeviceRunLengthEncode/8.Encode (76 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/8.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/8.NonTrivialRuns (74 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/8 (150 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/9, where TypeParam = params<int, unsigned int, 1000u, 5000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/9.Encode
[       OK ] RocprimDeviceRunLengthEncode/9.Encode (56 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/9.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/9.NonTrivialRuns (59 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/9 (115 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/10, where TypeParam = params<unsigned int, unsigned long, 2048u, 2048u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/10.Encode
[       OK ] RocprimDeviceRunLengthEncode/10.Encode (62 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/10.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/10.NonTrivialRuns (57 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/10 (120 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/11, where TypeParam = params<unsigned int, unsigned int, 1000u, 50000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/11.Encode
[       OK ] RocprimDeviceRunLengthEncode/11.Encode (57 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/11.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/11.NonTrivialRuns (60 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/11 (117 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/12, where TypeParam = params<unsigned long long, test_utils::custom_test_type<double>, 100000u, 100000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/12.Encode
[       OK ] RocprimDeviceRunLengthEncode/12.Encode (252 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/12.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/12.NonTrivialRuns (251 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/12 (503 ms total)

[----------] Global test environment tear-down
[==========] 26 tests from 13 test cases ran. (2628 ms total)
[  PASSED  ] 23 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>

 3 FAILED TESTS

struct carry_out
{
ROCPRIM_HOST_DEVICE inline
carry_out& operator=(carry_out rhs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that without this copy operator, the compiler will complain about using a default copy operator which has attribute __host__ but is being called by __device__ function. so I added this to add ROCPRIM_HOST_DEVICE.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be generated by the compiler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's might be a CUDA spec limitation where __device__ functions can only call explicitly defined __device__ functions. I will discuss it with my colleague

Copy link
Copy Markdown
Contributor Author

@aaronenyeshi aaronenyeshi Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec suggests without the attributes, it should be compiled for host code only :

The __host__ qualifier declares a function that is:
� Executed on the host,
� Callable from the host only.
It is equivalent to declare a function with only the __host__ qualifier or to declare
it without any of the __host__, __device__, or __global__ qualifier; in either
case the function is compiled for the host only. 

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, I guess I worked with hcc too much and assumed implicitly generated special member functions like copy assignment operator get required attributes as needed.

Anyway, instead of implementing them it's better to write:

ROCPRIM_HOST_DEVICE inline
scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;

That way there's no need for modification if member variables are changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that at first, but it seems that the default is only defined with __host__ :(

In file included from /root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:36:
In file included from /root/rocPRIM/rocprim/include/rocprim/rocprim.hpp:74:
In file included from /root/rocPRIM/rocprim/include/rocprim/device/device_reduce_by_key_hip.hpp:34:
/root/rocPRIM/rocprim/include/rocprim/device/detail/device_reduce_by_key.hpp:67:70: error: reference to
      __host__ function 'operator=' in __host__ __device__ function
    scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;
                                                                     ^

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try defaulting it separately for host and device? or better: only for device?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that works better, thanks Jakub

struct scan_by_key_pair
{
ROCPRIM_HOST_DEVICE inline
scan_by_key_pair& operator=(scan_by_key_pair rhs)
Copy link
Copy Markdown
Contributor

@jszuppe jszuppe Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this?

@jszuppe
Copy link
Copy Markdown
Contributor

jszuppe commented Aug 27, 2018

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.

With failing rocprim.hip.device_select which is used in tests that fail for rocprim.device_run_length_encode we can't say if there's anything wrong in run_length_encode_non_trivial_runs() function. You can notice that test that run run_length_encode() function pass.

@jszuppe jszuppe requested a review from ex-rzr August 27, 2018 21:05
@jszuppe jszuppe merged commit 4704c84 into ROCm:develop Aug 28, 2018
ammallya pushed a commit that referenced this pull request Oct 28, 2025
Add device_scan_* funcs

Closes #31

See merge request amd/rocPRIM!40

[ROCm/rocPRIM commit: edd8737]
ammallya pushed a commit that referenced this pull request Oct 28, 2025
Fix build issue with device_run_length_encode

[ROCm/rocPRIM commit: 4704c84]
assistant-librarian Bot pushed a commit that referenced this pull request Dec 12, 2025
[rocPRIM] Add gfx1152 and gfx1153
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Enable gfx1152 and gfx1153.

## Technical Details

1. combine arrays into tables and use local macros to reduce repetition
(for maintainability)
2. monkey-see-monkey-do wherever `gfx11...` was found

## Test Plan

Build existing ctests for, and run them on, gfx1152 and gfx1153.

## Test Result

###

<details>
<summary>gfx1152 passed (click to see log)</summary>

```
INFO:root:++ Exec [/tmp/eble]$ ctest --test-dir /tmp/eble/rocm/bin/rocprim --output-o
n-failure --parallel 2 --exclude-regex 'rocprim.lookback_reproducibility|rocprim.link
ing|rocprim.device_merge_inplace|rocprim.device_merge_sort|rocprim.device_partition|r
ocprim.device_radix_sort|rocprim.device_scan|rocprim.device_select|rocprim.device_fin
d_first_of|rocprim.device_reduce_by_key' --timeout 60
Test project /tmp/eble/rocm/bin/rocprim
      Start  1: hip.device_api
      Start  2: hip.async_copy
 1/73 Test  #2: hip.async_copy ..............................   Passed    0.01 sec
      Start  3: hip.ordered_block_id
 2/73 Test  #1: hip.device_api ..............................   Passed    0.02 sec
      Start  4: rocprim.internal_merge_path
 3/73 Test  #4: rocprim.internal_merge_path .................   Passed    0.01 sec
      Start  5: rocprim.basic_test
 4/73 Test  #3: hip.ordered_block_id ........................   Passed    0.01 sec
      Start  6: rocprim.arg_index_iterator
 5/73 Test  #5: rocprim.basic_test ..........................   Passed    0.01 sec
      Start  7: rocprim.temporary_storage_partitioning
 6/73 Test  #6: rocprim.arg_index_iterator ..................   Passed    0.01 sec
      Start  8: rocprim.block_adjacent_difference
 7/73 Test  #7: rocprim.temporary_storage_partitioning ......   Passed    0.01 sec
      Start  9: rocprim.block_discontinuity
 8/73 Test  #8: rocprim.block_adjacent_difference ...........   Passed    2.34 sec
      Start 10: rocprim.bit_cast
 9/73 Test #10: rocprim.bit_cast ............................   Passed    0.02 sec
      Start 11: rocprim.block_exchange
10/73 Test #11: rocprim.block_exchange ......................   Passed    0.73 sec
      Start 12: rocprim.block_histogram
11/73 Test #12: rocprim.block_histogram .....................   Passed    0.54 sec
      Start 13: rocprim.block_load_store
12/73 Test #13: rocprim.block_load_store ....................   Passed    0.44 sec
      Start 14: rocprim.block_sort_merge
13/73 Test #14: rocprim.block_sort_merge ....................   Passed    0.02 sec
      Start 15: rocprim.block_sort_merge_stable
14/73 Test #15: rocprim.block_sort_merge_stable .............   Passed    0.02 sec
      Start 16: rocprim.block_radix_rank
15/73 Test #16: rocprim.block_radix_rank ....................   Passed    0.03 sec
      Start 17: rocprim.block_radix_sort
16/73 Test #17: rocprim.block_radix_sort ....................   Passed    4.79 sec
      Start 18: rocprim.block_reduce
17/73 Test #18: rocprim.block_reduce ........................   Passed    0.26 sec
      Start 19: rocprim.block_run_length_decode
18/73 Test #19: rocprim.block_run_length_decode .............   Passed    0.54 sec
      Start 20: rocprim.block_scan
19/73 Test #20: rocprim.block_scan ..........................   Passed    0.04 sec
      Start 21: rocprim.block_shuffle
20/73 Test #21: rocprim.block_shuffle .......................   Passed    2.70 sec
      Start 22: rocprim.block_sort_bitonic
21/73 Test  #9: rocprim.block_discontinuity .................   Passed   17.36 sec
      Start 23: rocprim.config_dispatch
22/73 Test #23: rocprim.config_dispatch .....................   Passed    0.09 sec
      Start 24: rocprim.constant_iterator
23/73 Test #24: rocprim.constant_iterator ...................   Passed    0.07 sec
      Start 25: rocprim.counting_iterator
24/73 Test #25: rocprim.counting_iterator ...................   Passed    0.07 sec
      Start 26: rocprim.device_batch_memcpy
25/73 Test #26: rocprim.device_batch_memcpy .................   Passed    1.20 sec
      Start 27: rocprim.device_binary_search
26/73 Test #27: rocprim.device_binary_search ................   Passed    0.02 sec
      Start 28: rocprim.device_adjacent_difference
27/73 Test #28: rocprim.device_adjacent_difference ..........   Passed    0.01 sec
      Start 29: rocprim.device_adjacent_find
28/73 Test #29: rocprim.device_adjacent_find ................   Passed    0.01 sec
      Start 30: rocprim.device_find_end
29/73 Test #30: rocprim.device_find_end .....................   Passed    0.01 sec
      Start 31: rocprim.device_histogram
30/73 Test #22: rocprim.block_sort_bitonic ..................   Passed   13.23 sec
      Start 32: rocprim.device_merge
31/73 Test #31: rocprim.device_histogram ....................   Passed    7.85 sec
      Start 33: rocprim.nth_element
32/73 Test #33: rocprim.nth_element .........................   Passed    0.03 sec
      Start 34: rocprim.device_partial_sort
33/73 Test #34: rocprim.device_partial_sort .................   Passed    0.02 sec
      Start 35: rocprim.device_reduce
34/73 Test #35: rocprim.device_reduce .......................   Passed    8.94 sec
      Start 36: rocprim.device_run_length_encode
35/73 Test #32: rocprim.device_merge ........................   Passed   14.05 sec
      Start 37: rocprim.device_search
36/73 Test #37: rocprim.device_search .......................   Passed    0.02 sec
      Start 38: rocprim.device_segmented_radix_sort
37/73 Test #36: rocprim.device_run_length_encode ............   Passed   13.92 sec
      Start 39: rocprim.device_search_n
38/73 Test #39: rocprim.device_search_n .....................   Passed    0.02 sec
      Start 40: rocprim.device_segmented_reduce
39/73 Test #40: rocprim.device_segmented_reduce .............   Passed    5.21 sec
      Start 41: rocprim.device_segmented_scan
40/73 Test #41: rocprim.device_segmented_scan ...............   Passed    0.02 sec
      Start 42: rocprim.device_transform
41/73 Test #42: rocprim.device_transform ....................   Passed   13.90 sec
      Start 43: rocprim.discard_iterator
42/73 Test #43: rocprim.discard_iterator ....................   Passed    0.07 sec
      Start 44: rocprim.radix_key_codec
43/73 Test #44: rocprim.radix_key_codec .....................   Pas09:54:11 [55/1943]
      Start 45: rocprim.predicate_iterator
44/73 Test #45: rocprim.predicate_iterator ..................   Passed    0.07 sec
      Start 46: rocprim.reverse_iterator
45/73 Test #46: rocprim.reverse_iterator ....................   Passed    0.09 sec
      Start 47: rocprim.rocprim_tuple
46/73 Test #47: rocprim.rocprim_tuple .......................   Passed    0.01 sec
      Start 48: rocprim.rocprim_types
47/73 Test #48: rocprim.rocprim_types .......................   Passed    0.01 sec
      Start 49: rocprim.texture_cache_iterator
48/73 Test #49: rocprim.texture_cache_iterator ..............   Passed    0.01 sec
      Start 50: rocprim.thread
49/73 Test #50: rocprim.thread ..............................   Passed    0.07 sec
      Start 51: rocprim.thread_algos
50/73 Test #51: rocprim.thread_algos ........................   Passed    0.35 sec
      Start 52: rocprim.tuple
51/73 Test #52: rocprim.tuple ...............................   Passed    0.02 sec
      Start 53: rocprim.utils_sort_checker
52/73 Test #53: rocprim.utils_sort_checker ..................   Passed    0.01 sec
      Start 54: rocprim.transform_iterator
53/73 Test #54: rocprim.transform_iterator ..................   Passed    0.11 sec
      Start 55: rocprim.type_traits_interface_cpp17
54/73 Test #55: rocprim.type_traits_interface_cpp17 .........   Passed    0.01 sec
      Start 56: rocprim.type_traits_interface_gnupp17
55/73 Test #56: rocprim.type_traits_interface_gnupp17 .......   Passed    0.01 sec
      Start 57: rocprim.type_traits_interface_cpp20
56/73 Test #57: rocprim.type_traits_interface_cpp20 .........   Passed    0.01 sec
      Start 58: rocprim.type_traits_interface_gnupp20
57/73 Test #58: rocprim.type_traits_interface_gnupp20 .......   Passed    0.01 sec
      Start 59: rocprim.no_half_operators
58/73 Test #59: rocprim.no_half_operators ...................   Passed    0.01 sec
      Start 60: rocprim.intrinsics
59/73 Test #60: rocprim.intrinsics ..........................   Passed    0.21 sec
      Start 61: rocprim.intrinsics_atomic
60/73 Test #61: rocprim.intrinsics_atomic ...................   Passed    0.02 sec
      Start 62: rocprim.invoke_result
61/73 Test #62: rocprim.invoke_result .......................   Passed    0.01 sec
      Start 63: rocprim.warp_exchange
62/73 Test #63: rocprim.warp_exchange .......................   Passed    0.08 sec
      Start 64: rocprim.warp_load
63/73 Test #64: rocprim.warp_load ...........................   Passed    0.08 sec
      Start 65: rocprim.warp_reduce
64/73 Test #65: rocprim.warp_reduce .........................   Passed    0.14 sec
      Start 66: rocprim.warp_scan
65/73 Test #66: rocprim.warp_scan ...........................   Passed    0.20 sec
      Start 67: rocprim.warp_scan_disable_dpp_disable_dpp
66/73 Test #67: rocprim.warp_scan_disable_dpp_disable_dpp ...   Passed    0.21 sec
      Start 68: rocprim.warp_sort
67/73 Test #68: rocprim.warp_sort ...........................   Passed    0.09 sec
      Start 69: rocprim.warp_store
68/73 Test #69: rocprim.warp_store ..........................   Passed    0.02 sec
      Start 70: rocprim.zip_iterator
69/73 Test #70: rocprim.zip_iterator ........................   Passed    0.02 sec
      Start 71: rocprim.accumulator_t
70/73 Test #71: rocprim.accumulator_t .......................   Passed    0.02 sec
      Start 72: hipgraph.basic
71/73 Test #72: hipgraph.basic ..............................   Passed    0.02 sec
      Start 73: hipgraph.algs
72/73 Test #73: hipgraph.algs ...............................   Passed    0.01 sec
73/73 Test #38: rocprim.device_segmented_radix_sort .........   Passed   31.97 sec

100% tests passed, 0 tests failed out of 73

Total Test time (real) =  71.80 sec
✅ test_rocprim.py PASSED
```

</details>

<details>
<summary>gfx1153 passed (click to see log)</summary>

```
INFO:root:++ Exec [/tmp/eble]$ ctest --test-dir /tmp/eble/rocm/bin/rocprim --output-o
n-failure --parallel 1 --exclude-regex 'rocprim.lookback_reproducibility|rocprim.link
ing|rocprim.device_merge_inplace|rocprim.device_merge_sort|rocprim.device_partition|r
ocprim.device_radix_sort|rocprim.device_scan|rocprim.device_select|rocprim.device_fin
d_first_of|rocprim.device_reduce_by_key' --timeout 60
Test project /tmp/eble/rocm/bin/rocprim
      Start  1: hip.device_api
 1/73 Test  #1: hip.device_api ..............................   Passed    0.01 sec
      Start  2: hip.async_copy
 2/73 Test  #2: hip.async_copy ..............................   Passed    0.01 sec
      Start  3: hip.ordered_block_id
 3/73 Test  #3: hip.ordered_block_id ........................   Passed    0.01 sec
      Start  4: rocprim.internal_merge_path
 4/73 Test  #4: rocprim.internal_merge_path .................   Passed    0.01 sec
      Start  5: rocprim.basic_test
 5/73 Test  #5: rocprim.basic_test ..........................   Passed    0.01 sec
      Start  6: rocprim.arg_index_iterator
 6/73 Test  #6: rocprim.arg_index_iterator ..................   Passed    0.01 sec
      Start  7: rocprim.temporary_storage_partitioning
 7/73 Test  #7: rocprim.temporary_storage_partitioning ......   Passed    0.01 sec
      Start  8: rocprim.block_adjacent_difference
 8/73 Test  #8: rocprim.block_adjacent_difference ...........   Passed    2.94 sec
      Start  9: rocprim.block_discontinuity
 9/73 Test  #9: rocprim.block_discontinuity .................   Passed   21.10 sec
      Start 10: rocprim.bit_cast
10/73 Test #10: rocprim.bit_cast ............................   Passed    0.01 sec
      Start 11: rocprim.block_exchange
11/73 Test #11: rocprim.block_exchange ......................   Passed    2.20 sec
      Start 12: rocprim.block_histogram
12/73 Test #12: rocprim.block_histogram .....................   Passed    0.71 sec
      Start 13: rocprim.block_load_store
13/73 Test #13: rocprim.block_load_store ....................   Passed    0.48 sec
      Start 14: rocprim.block_sort_merge
14/73 Test #14: rocprim.block_sort_merge ....................   Passed    0.02 sec
      Start 15: rocprim.block_sort_merge_stable
15/73 Test #15: rocprim.block_sort_merge_stable .............   Passed    0.02 sec
      Start 16: rocprim.block_radix_rank
16/73 Test #16: rocprim.block_radix_rank ....................   Passed    0.02 sec
      Start 17: rocprim.block_radix_sort
17/73 Test #17: rocprim.block_radix_sort ....................   Passed    6.12 sec
      Start 18: rocprim.block_reduce
18/73 Test #18: rocprim.block_reduce ........................   Passed    0.31 sec
      Start 19: rocprim.block_run_length_decode
19/73 Test #19: rocprim.block_run_length_decode .............   Passed    0.68 sec
      Start 20: rocprim.block_scan
20/73 Test #20: rocprim.block_scan ..........................   Passed    0.03 sec
      Start 21: rocprim.block_shuffle
21/73 Test #21: rocprim.block_shuffle .......................   Passed    3.63 sec
      Start 22: rocprim.block_sort_bitonic
22/73 Test #22: rocprim.block_sort_bitonic ..................   Passed   19.34 sec
      Start 23: rocprim.config_dispatch
23/73 Test #23: rocprim.config_dispatch .....................   Passed    0.10 sec
      Start 24: rocprim.constant_iterator
24/73 Test #24: rocprim.constant_iterator ...................   Passed    0.09 sec
      Start 25: rocprim.counting_iterator
25/73 Test #25: rocprim.counting_iterator ...................   Passed    0.09 sec
      Start 26: rocprim.device_batch_memcpy
26/73 Test #26: rocprim.device_batch_memcpy .................   Passed    1.42 sec
      Start 27: rocprim.device_binary_search
27/73 Test #27: rocprim.device_binary_search ................   Passed    0.01 sec
      Start 28: rocprim.device_adjacent_difference
28/73 Test #28: rocprim.device_adjacent_difference ..........   Passed    0.01 sec
      Start 29: rocprim.device_adjacent_find
29/73 Test #29: rocprim.device_adjacent_find ................   Passed    0.01 sec
      Start 30: rocprim.device_find_end
30/73 Test #30: rocprim.device_find_end .....................   Passed    0.01 sec
      Start 31: rocprim.device_histogram
31/73 Test #31: rocprim.device_histogram ....................   Passed    8.77 sec
      Start 32: rocprim.device_merge
32/73 Test #32: rocprim.device_merge ........................   Passed   16.23 sec
      Start 33: rocprim.nth_element
33/73 Test #33: rocprim.nth_element .........................   Passed    0.01 sec
      Start 34: rocprim.device_partial_sort
34/73 Test #34: rocprim.device_partial_sort .................   Passed    0.02 sec
      Start 35: rocprim.device_reduce
35/73 Test #35: rocprim.device_reduce .......................   Passed   10.92 sec
      Start 36: rocprim.device_run_length_encode
36/73 Test #36: rocprim.device_run_length_encode ............   Passed   14.34 sec
      Start 37: rocprim.device_search
37/73 Test #37: rocprim.device_search .......................   Passed    0.01 sec
      Start 38: rocprim.device_segmented_radix_sort
38/73 Test #38: rocprim.device_segmented_radix_sort .........   Passed   38.28 sec
      Start 39: rocprim.device_search_n
39/73 Test #39: rocprim.device_search_n .....................   Passed    0.02 sec
      Start 40: rocprim.device_segmented_reduce
40/73 Test #40: rocprim.device_segmented_reduce .............   Passed    7.19 sec
      Start 41: rocprim.device_segmented_scan
41/73 Test #41: rocprim.device_segmented_scan ...............   Passed    0.02 sec
      Start 42: rocprim.device_transform
42/73 Test #42: rocprim.device_transform ....................   Passed   17.64 sec
      Start 43: rocprim.discard_iterator
43/73 Test #43: rocprim.discard_iterator ....................   Passed    0.12 sec
      Start 44: rocprim.radix_key_codec
44/73 Test #44: rocprim.radix_key_codec .....................   Passed    0.01 sec
      Start 45: rocprim.predicate_iterator
45/73 Test #45: rocprim.predicate_iterator ..................   Passed    0.08 sec
      Start 46: rocprim.reverse_iterator
46/73 Test #46: rocprim.reverse_iterator ....................   Pas10:13:26 [46/1844]
      Start 47: rocprim.rocprim_tuple
47/73 Test #47: rocprim.rocprim_tuple .......................   Passed    0.01 sec
      Start 48: rocprim.rocprim_types
48/73 Test #48: rocprim.rocprim_types .......................   Passed    0.01 sec
      Start 49: rocprim.texture_cache_iterator
49/73 Test #49: rocprim.texture_cache_iterator ..............   Passed    0.01 sec
      Start 50: rocprim.thread
50/73 Test #50: rocprim.thread ..............................   Passed    0.08 sec
      Start 51: rocprim.thread_algos
51/73 Test #51: rocprim.thread_algos ........................   Passed    0.43 sec
      Start 52: rocprim.tuple
52/73 Test #52: rocprim.tuple ...............................   Passed    0.01 sec
      Start 53: rocprim.utils_sort_checker
53/73 Test #53: rocprim.utils_sort_checker ..................   Passed    0.01 sec
      Start 54: rocprim.transform_iterator
54/73 Test #54: rocprim.transform_iterator ..................   Passed    0.12 sec
      Start 55: rocprim.type_traits_interface_cpp17
55/73 Test #55: rocprim.type_traits_interface_cpp17 .........   Passed    0.01 sec
      Start 56: rocprim.type_traits_interface_gnupp17
56/73 Test #56: rocprim.type_traits_interface_gnupp17 .......   Passed    0.01 sec
      Start 57: rocprim.type_traits_interface_cpp20
57/73 Test #57: rocprim.type_traits_interface_cpp20 .........   Passed    0.01 sec
      Start 58: rocprim.type_traits_interface_gnupp20
58/73 Test #58: rocprim.type_traits_interface_gnupp20 .......   Passed    0.01 sec
      Start 59: rocprim.no_half_operators
59/73 Test #59: rocprim.no_half_operators ...................   Passed    0.01 sec
      Start 60: rocprim.intrinsics
60/73 Test #60: rocprim.intrinsics ..........................   Passed    0.29 sec
      Start 61: rocprim.intrinsics_atomic
61/73 Test #61: rocprim.intrinsics_atomic ...................   Pas10:13:27 [16/1844]
      Start 62: rocprim.invoke_result
62/73 Test #62: rocprim.invoke_result .......................   Passed    0.01 sec
      Start 63: rocprim.warp_exchange
63/73 Test #63: rocprim.warp_exchange .......................   Passed    0.09 sec
      Start 64: rocprim.warp_load
64/73 Test #64: rocprim.warp_load ...........................   Passed    0.09 sec
      Start 65: rocprim.warp_reduce
65/73 Test #65: rocprim.warp_reduce .........................   Passed    0.17 sec
      Start 66: rocprim.warp_scan
66/73 Test #66: rocprim.warp_scan ...........................   Passed    0.26 sec
      Start 67: rocprim.warp_scan_disable_dpp_disable_dpp
67/73 Test #67: rocprim.warp_scan_disable_dpp_disable_dpp ...   Passed    0.26 sec
      Start 68: rocprim.warp_sort
68/73 Test #68: rocprim.warp_sort ...........................   Passed    0.10 sec
      Start 69: rocprim.warp_store
69/73 Test #69: rocprim.warp_store ..........................   Passed    0.01 sec
      Start 70: rocprim.zip_iterator
70/73 Test #70: rocprim.zip_iterator ........................   Passed    0.01 sec
      Start 71: rocprim.accumulator_t
71/73 Test #71: rocprim.accumulator_t .......................   Passed    0.01 sec
      Start 72: hipgraph.basic
72/73 Test #72: hipgraph.basic ..............................   Passed    0.01 sec
      Start 73: hipgraph.algs
73/73 Test #73: hipgraph.algs ...............................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 73

Total Test time (real) = 175.34 sec
✅ test_rocprim.py PASSED
```

<detail>

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants