Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
790af56
Bump main to 2.9.0.
matyas-streamhpc Oct 9, 2025
718586a
Bump main to 3.0.0.
matyas-streamhpc Oct 9, 2025
a4636e5
Require C++17 for compiling hipCUB
matyas-streamhpc Oct 9, 2025
bd794da
Drop `BinaryFlip` operator
matyas-streamhpc Oct 9, 2025
a9cfc51
Deprecate `hipcub::Swap`
matyas-streamhpc Oct 13, 2025
340e1d6
Drop hipCUB APIs with a debug_synchronous parameter
matyas-streamhpc Oct 15, 2025
d5e9910
Update deprecated warning for hipcub::DivideAndRoundUp
matyas-streamhpc Oct 27, 2025
a5f87e4
Add `hip::std` support
matyas-streamhpc Nov 24, 2025
ce70816
Adds support for large number of items and large number of segments t…
matyas-streamhpc Oct 27, 2025
fa05565
Deprecate hipcub::min and hipcub:max and replace internal uses of std…
matyas-streamhpc Dec 5, 2025
6e6b96d
Remove deprecated hipcub::min and hipcub::max
matyas-streamhpc Dec 5, 2025
17d1aec
Replace rocprim counting iterator to thrust for nvcc compatibility
matyas-streamhpc Oct 30, 2025
b3550fd
Deprecate and replace `HIPCUB_IS_INT128_ENABLED`
matyas-streamhpc Nov 3, 2025
7b1b7f6
Adds support for large num items to ``DeviceMerge``
matyas-streamhpc Nov 3, 2025
db13289
Remove ``DeviceSpmv``
matyas-streamhpc Nov 3, 2025
c95af95
Adds support for large number of segments to DeviceSegmentedReduce
matyas-streamhpc Dec 4, 2025
0351bd5
Remove ``LEGACY_PTX_ARCH``
matyas-streamhpc Nov 3, 2025
d6f4a57
Fix hipCUB trait deprecations
matyas-streamhpc Nov 14, 2025
5ab3fb9
Drop deprecated entities from util_type
matyas-streamhpc Nov 13, 2025
6a621f0
Drop ``GridBarrier``
matyas-streamhpc Nov 13, 2025
ef75cb4
Replace pre-c++17 traits with modern ones
matyas-streamhpc Nov 13, 2025
ca88b0e
Replace ``HIPCUB_IF_CONSTEXPR``
matyas-streamhpc Nov 13, 2025
8d096ad
Drop small deprecated entites
matyas-streamhpc Nov 13, 2025
4d19462
Remove deprecated iterators
matyas-streamhpc Nov 13, 2025
3eee1cd
Add `__int128`, `__half` and `hip_bfloat16` overloads to `to_bits` in…
matyas-streamhpc Nov 17, 2025
310bb5e
Fix add implicit conversion for read access for `conditional_discard_…
matyas-streamhpc Nov 24, 2025
4fd0e1b
Drop deprecated features from util_ptx.cuh
matyas-streamhpc Dec 1, 2025
b553307
Fix typo in TEST_UTILS_INCLUDE_GUARD
matyas-streamhpc Dec 2, 2025
4592305
Use hip::std limits
matyas-streamhpc Dec 1, 2025
785bff8
Move iterator traits to hip::std::iterator_traits
matyas-streamhpc Dec 9, 2025
a5e3050
Avoid int overflow during multiplication
matyas-streamhpc Dec 3, 2025
90f8bb6
Remove `HIPCUB_MIN` and `HIPCUB_MAX`
matyas-streamhpc Dec 3, 2025
917f1e1
Minimize usage of `hipcub::Traits`
matyas-streamhpc Dec 8, 2025
9ad0123
Drop deprecated CUB macros
matyas-streamhpc Dec 3, 2025
c0b017b
Update mdspan support
matyas-streamhpc Dec 3, 2025
cb35f13
Replace util_arch.cuh macros with inline constexpr variables
matyas-streamhpc Dec 3, 2025
6cc2050
Use `_CCCL_PRAGMA_UNROLL_FULL()` and `_CCCL_PRAGMA_NOUNROLL()`
matyas-streamhpc Dec 3, 2025
a803a68
Allow rapids to avoid unrolling some loops in sort
matyas-streamhpc Dec 3, 2025
1ae1c0c
Adds support for large num items to DeviceMerge
matyas-streamhpc Dec 3, 2025
50500b7
Adds support for large number of buffers to `DeviceCpy::Batched` and …
matyas-streamhpc Dec 3, 2025
af803c0
Fix hip std namespace
matyas-streamhpc Dec 3, 2025
8de8ac3
Increase test coverage for hipCUB batched device copy and add fix for…
matyas-streamhpc Dec 10, 2025
53242d3
fix(hipcub): fix broken bitwise representation derivation for extende…
Naraenda Dec 16, 2025
1e9b0bc
Resolve "Fix is_floating_point value in hipCUB/rocPRIM"
cenxuantian Dec 18, 2025
040d9ab
Update copyright year
matyas-streamhpc Jan 5, 2026
9c4bf18
Update CHANGELOG
matyas-streamhpc Jan 26, 2026
be22a9d
Fix clang format
matyas-streamhpc Jan 26, 2026
16944ff
Update counting, discard and transform iterator CUB backend for tests
matyas-streamhpc Oct 30, 2025
cf1f9bb
Fix assert_near
matyas-streamhpc Jan 14, 2026
c9c9341
Fix is_floating_point_v error for half/bfloat16 with CUDA
matyas-streamhpc Jan 7, 2026
55b1831
Replace util_arch.cuh macros with inline constexpr variables
matyas-streamhpc Jan 7, 2026
10b8037
Add cuda headers
matyas-streamhpc Jan 7, 2026
b072d5d
Fix device histogram
matyas-streamhpc Jan 7, 2026
05ea181
Fix to_bits
matyas-streamhpc Jan 8, 2026
5783809
Fix ForEachInExtentsAPI test
matyas-streamhpc Jan 8, 2026
9d685f1
Fix device_test_enabled_for_warp_size_v for CUDA
matyas-streamhpc Jan 8, 2026
d01cab5
Fix device_reduce CUB backend
matyas-streamhpc Jan 9, 2026
27892a1
Fix cub util_type include
matyas-streamhpc Jan 14, 2026
f130538
Update WarpExchange rocPRIM backend for CUB compatibility
matyas-streamhpc Jan 15, 2026
0727caf
Fix for ArgMax/Min
matyas-streamhpc Jan 9, 2026
26d1985
Fix type in test
matyas-streamhpc Jan 20, 2026
21ff3fa
Fix device select UniqueByKey
matyas-streamhpc Jan 20, 2026
ffb1b7a
Fix clang format
matyas-streamhpc Jan 20, 2026
9957d9f
Fix type in benchmark segmented sort
matyas-streamhpc Jan 20, 2026
61aae79
Fix types in segmented sort
matyas-streamhpc Jan 20, 2026
e81342c
Device segmented sort - support large indices
matyas-streamhpc Jan 20, 2026
d7621f1
Patch thread operators - add missing headers
matyas-streamhpc Jan 7, 2026
567e65f
Patch thread operators - add cub backend
matyas-streamhpc Jan 7, 2026
cc1e483
Patch thread operators - fix
matyas-streamhpc Jan 22, 2026
bcb082c
Fix types and formatting in generate resource spec
matyas-streamhpc Jan 26, 2026
1cb4b54
Remove internal `hipcub::Division` usage
matyas-streamhpc Jan 28, 2026
1a2bd5c
Internal removal of `hipcub::Inequality`
matyas-streamhpc Jan 28, 2026
b1d660f
Fix sort keys over 4GB
matyas-streamhpc Jan 28, 2026
904a941
Fix device radix sort large sizes test
matyas-streamhpc Jan 28, 2026
51afcd9
Fix device reduce ArgMin/Max CUB backend
matyas-streamhpc Feb 11, 2026
05ebcd7
Fix block merge sort default value management for CUB backend compati…
matyas-streamhpc Feb 17, 2026
8370c97
Fix block radix rank test for CUB backend
matyas-streamhpc Feb 19, 2026
5db1675
Fix static linking for Windows
matyas-streamhpc Feb 19, 2026
c4fa8b7
Fix Windows build by forcing static MSVC runtime to match vcpkg stati…
matyas-streamhpc Feb 20, 2026
6e168e7
Fix Windows HIP cannot support allocations over 4 GiB
matyas-streamhpc Feb 23, 2026
9a8e7c2
Remove internal ``hipcub::Difference`` usage
matyas-streamhpc Feb 26, 2026
4048812
Remove internal ``hipcub::InequalityWrapper`` usage
matyas-streamhpc Feb 26, 2026
a8d2452
Remove internal ``hipcub::Equality`` usage
matyas-streamhpc Feb 26, 2026
83a0571
Remove internal ``hipcub::Max`` usage
matyas-streamhpc Feb 26, 2026
b35571e
Remove internal usage of ``hipcub::Sum``
matyas-streamhpc Mar 2, 2026
06b9761
Remove headers that include internally used thread operators
matyas-streamhpc Mar 5, 2026
8a1d5bd
Deprecate thread operators
matyas-streamhpc Mar 6, 2026
2b6783a
Remove tests of deprecated thread operators
matyas-streamhpc Mar 9, 2026
c6f8e0a
Deprecate and replace `hipcub::BFE`
matyas-streamhpc Mar 9, 2026
eff9570
Fix libhipcxx version check
matyas-streamhpc Mar 9, 2026
02eaac0
Add sort_last for custom test type
matyas-streamhpc Apr 7, 2026
33cb775
Update device reduce test cases
matyas-streamhpc Apr 8, 2026
789b242
Add numeric_limits support for custom test types
matyas-streamhpc Apr 8, 2026
1441fba
Skip unsupported bfloat16/half histogram tests
matyas-streamhpc Apr 16, 2026
9b42cec
feat(hipcub): add infrastructure to expose the compatible cccl version
Naraenda May 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions projects/hipcub/.clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -167,4 +167,10 @@ Macros:
- HIPCUB_DETAIL_DEPRECATED_DEBUG_SYNCHRONOUS=[[DETAIL_DEPRECATED_DEBUG_SYNCHRONOUS___]]
BreakAfterAttributes: Always

WhitespaceSensitiveMacros: [
'HIPCUB_HAS_INCLUDE',
'_HIPCUB_LIBCXX_INCLUDE',
'_HIPCUB_STD_INCLUDE'
]

---
27 changes: 27 additions & 0 deletions projects/hipcub/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,33 @@

Full documentation for hipCUB is available at [https://rocm.docs.amd.com/projects/hipCUB/en/latest/](https://rocm.docs.amd.com/projects/hipCUB/en/latest/).

## hipCUB-4.5.0 for ROCm 7.14

### Added

* Added `::hip::std` support.

### Changed

* Changed `CCCL_MINIMUM_VERSION` to `3.0.0` to align with CUB.
* Add support for large num_items `DeviceMerge` and `DeviceSegmentedSort`.
* Replace `#pragma unroll` by `_CCCL_PRAGMA_UNROLL_FULL()` and `_CCCL_PRAGMA_NOUNROLL()` by `_CCCL_PRAGMA_NOUNROLL()`.
* Add `_CCCL_SORT_MAYBE_UNROLL()` in block merge sort and thread sort.
* Update `WarpExchange` template parameters for CUB compatibility.

### Removed

* Removed `hipcub::BaseTraits::CATEGORY`, `hipcub::BaseTraits::nullptr_TYPE` and `hipcub::BaseTraits::PRIMITIVE`.
* Removed `ConstantInputIterator`, `CountingInputIterator`, `DiscardOutputIterator` and `TransformInputIterator` which were deprecated in hipCUB-4.1.0.
* Removed `DeviceSpmv`, which was removed from CUB after CCCL's 2.8.0 release. Use `hipSPARSE` or `rocSPARSE` libraries instead.
* Removed `GridBarrier`.
* Removed `HIPCUB_MIN`, `HIPCUB_MAX`, `HIPCUB_QUOTIENT_FLOOR`, `HIPCUB_QUOTIENT_CEILING`, `HIPCUB_ROUND_UP_NEAREST` and `HIPCUB_ROUND_DOWN_NEAREST` which were deprecated in hipCUB-4.1.0.
* Removed `LEGACY_PTX_ARCH`.
* Removed `hipcub:max` and `hipcub:min`, which were deprecated. Use `hip::std::max` and `hip::std::min` instead.
* Deprecated `hipcub::Swap`, use `rocprim::swap` instead.
* Deprecated `HIPCUB_IS_INT128_ENABLED`, use `_CCCL_HAS_INT128()` instead.
* Deprecated `hipcub::Equality`, `hipcub::Inequality`, `hipcub::InequalityWrapper`, `hipcub::Sum`, `hipcub::Difference`, `hipcub::Division`, `hipcub::Max` and `hipcub::Min` operators. Use `hip::std::equal_to`, `hip::std::not_equal_to`, `hip::std::plus`, `hip::std::minus`, `hip::std::divides`, `hip::maximum` and `hip:minimum` operators instead.

## Since last release ROCm 7.12

### Optimizations
Expand Down
15 changes: 9 additions & 6 deletions projects/hipcub/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2025 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2017-2026 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -28,6 +28,10 @@ cmake_policy(VERSION 3.18...3.25)
#
# Set the library version
set(VERSION_STRING "4.4.0")
# Set the CCCL-compatible version.
set(HIPCUB_CCCL_VERSION_MAJOR 3)
set(HIPCUB_CCCL_VERSION_MINOR 0)
set(HIPCUB_CCCL_VERSION_PATCH 3)
# Set the minimum required rocPRIM version
set(MIN_ROCPRIM_PACKAGE_VERSION "4.1.0" CACHE STRING "Minimum version of rocPRIM to search for when ROCPRIM_FETCH_METHOD is set to PACKAGE.")
# Set download branch for dependency rocPRIM
Expand All @@ -47,10 +51,8 @@ endif()
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

if (CMAKE_CXX_STANDARD EQUAL 14)
message(WARNING "C++14 will be deprecated in the next major release")
elseif(NOT CMAKE_CXX_STANDARD EQUAL 17)
message(FATAL_ERROR "Only C++14 and C++17 are supported")
if(NOT CMAKE_CXX_STANDARD EQUAL 17)
message(FATAL_ERROR "Only C++17 are supported")
endif()

# Set HIP flags
Expand Down Expand Up @@ -169,6 +171,7 @@ endif()
# Setup the library version
rocm_setup_version(VERSION ${VERSION_STRING})
math(EXPR hipcub_VERSION_NUMBER "${hipcub_VERSION_MAJOR} * 100000 + ${hipcub_VERSION_MINOR} * 100 + ${hipcub_VERSION_PATCH}")
math(EXPR HIPCUB_CCCL_VERSION_NUMBER "${HIPCUB_CCCL_VERSION_MAJOR} * 100000 + ${HIPCUB_CCCL_VERSION_MINOR} * 100 + ${HIPCUB_CCCL_VERSION_PATCH}")

# Find and verify HIP.
include(VerifyCompiler)
Expand All @@ -184,7 +187,7 @@ endif()
include(CheckCXXCompilerFlag)

if(BUILD_OFFLOAD_COMPRESS)
# We need to pass '-x hip' since check_cxx_compiler_flag assumes c++ and not HIP.
# We need to pass '-x hip' since check_cxx_compiler_flag assumes c++ and not HIP.
check_cxx_compiler_flag("--offload-compress -x hip" CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
if(CXX_COMPILER_SUPPORTS_OFFLOAD_COMPRESS)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --offload-compress")
Expand Down
8 changes: 6 additions & 2 deletions projects/hipcub/benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2020-2025 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2020-2026 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -35,6 +35,11 @@ function(add_hipcub_benchmark BENCHMARK_SOURCE)
benchmark::benchmark
hipcub
)

if (WIN32)
target_compile_definitions(${BENCHMARK_TARGET} PRIVATE BENCHMARK_STATIC_DEFINE)
endif()

if((HIP_COMPILER STREQUAL "nvcc"))
set_property(TARGET ${BENCHMARK_TARGET} PROPERTY CUDA_STANDARD 17)
set_source_files_properties(${BENCHMARK_SOURCE} PROPERTIES LANGUAGE CUDA)
Expand Down Expand Up @@ -100,7 +105,6 @@ add_hipcub_benchmark(benchmark_device_segmented_sort.cpp)
add_hipcub_benchmark(benchmark_device_segmented_radix_sort.cpp)
add_hipcub_benchmark(benchmark_device_segmented_reduce.cpp)
add_hipcub_benchmark(benchmark_device_select.cpp)
add_hipcub_benchmark(benchmark_device_spmv.cpp)
add_hipcub_benchmark(benchmark_warp_exchange.cpp)
add_hipcub_benchmark(benchmark_warp_load.cpp)
add_hipcub_benchmark(benchmark_warp_reduce.cpp)
Expand Down
20 changes: 10 additions & 10 deletions projects/hipcub/benchmark/benchmark_block_adjacent_difference.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020-2022 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2026 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -63,7 +63,7 @@ struct subtract_left

hipcub::BlockAdjacentDifference<T, BlockSize> adjacent_difference;

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < trials; trial++)
{
T output[ItemsPerThread];
Expand Down Expand Up @@ -106,7 +106,7 @@ struct subtract_left_partial_tile
// Try to evenly distribute the length of tile_sizes between all the trials
const auto tile_size_diff = (BlockSize * ItemsPerThread) / trials + 1;

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < trials; trial++)
{
T output[ItemsPerThread];
Expand Down Expand Up @@ -150,7 +150,7 @@ struct subtract_right

hipcub::BlockAdjacentDifference<T, BlockSize> adjacent_difference;

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < trials; trial++)
{
T output[ItemsPerThread];
Expand Down Expand Up @@ -193,7 +193,7 @@ struct subtract_right_partial_tile
// Try to evenly distribute the length of tile_sizes between all the trials
const auto tile_size_diff = (BlockSize * ItemsPerThread) / trials + 1;

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < trials; trial++)
{
T output[ItemsPerThread];
Expand Down Expand Up @@ -221,8 +221,8 @@ template<class Benchmark,
bool WithTile,
unsigned int Trials = 100>
auto run_benchmark(benchmark::State& state, hipStream_t stream, size_t N)
-> std::enable_if_t<!std::is_same<Benchmark, subtract_left_partial_tile>::value
&& !std::is_same<Benchmark, subtract_right_partial_tile>::value>
-> std::enable_if_t<!std::is_same_v<Benchmark, subtract_left_partial_tile>
&& !std::is_same_v<Benchmark, subtract_right_partial_tile>>
{
constexpr auto items_per_block = BlockSize * ItemsPerThread;
const auto num_blocks = (N + items_per_block - 1) / items_per_block;
Expand Down Expand Up @@ -271,8 +271,8 @@ template<class Benchmark,
bool WithTile,
unsigned int Trials = 100>
auto run_benchmark(benchmark::State& state, hipStream_t stream, size_t N)
-> std::enable_if_t<std::is_same<Benchmark, subtract_left_partial_tile>::value
|| std::is_same<Benchmark, subtract_right_partial_tile>::value>
-> std::enable_if_t<std::is_same_v<Benchmark, subtract_left_partial_tile>
|| std::is_same_v<Benchmark, subtract_right_partial_tile>>
{
constexpr auto items_per_block = BlockSize * ItemsPerThread;
const auto num_blocks = (N + items_per_block - 1) / items_per_block;
Expand Down Expand Up @@ -352,7 +352,7 @@ void add_benchmarks(const std::string& name,
BENCHMARK_TYPE(long long, 256, false),
BENCHMARK_TYPE(double, 256, false)};

if(!std::is_same<Benchmark, subtract_right_partial_tile>::value)
if(!std::is_same_v<Benchmark, subtract_right_partial_tile>)
{
bs.insert(bs.end(),
{BENCHMARK_TYPE(int, 256, true),
Expand Down
40 changes: 18 additions & 22 deletions projects/hipcub/benchmark/benchmark_block_discontinuity.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2026 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand All @@ -24,21 +24,20 @@
#include <hipcub/block/block_discontinuity.hpp>
#include <hipcub/block/block_load.hpp>
#include <hipcub/block/block_store.hpp>
#include <hipcub/thread/thread_operators.hpp> //to use hipcub::Equality

#include "common_benchmark_header.hpp"

#ifndef DEFAULT_N
const size_t DEFAULT_N = 1024 * 1024 * 128;
#endif

template<class T>
struct custom_flag_op1
struct equal
{
HIPCUB_HOST_DEVICE
bool operator()(const T& a, const T& b) const
template<class A, class B>
HIPCUB_HOST_DEVICE
inline constexpr auto operator()(const A& a, const B& b) const
{
return (a == b);
return a == b;
}
};

Expand Down Expand Up @@ -68,17 +67,17 @@ struct flag_heads
T input[ItemsPerThread];
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockDiscontinuity<T, BlockSize> bdiscontinuity;
bool head_flags[ItemsPerThread];
if(WithTile)
{
bdiscontinuity.FlagHeads(head_flags, input, hipcub::Equality(), T(123));
bdiscontinuity.FlagHeads(head_flags, input, equal(), T(123));
} else
{
bdiscontinuity.FlagHeads(head_flags, input, hipcub::Equality());
bdiscontinuity.FlagHeads(head_flags, input, equal());
}

for(unsigned int i = 0; i < ItemsPerThread; i++)
Expand Down Expand Up @@ -106,17 +105,17 @@ struct flag_tails
T input[ItemsPerThread];
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockDiscontinuity<T, BlockSize> bdiscontinuity;
bool tail_flags[ItemsPerThread];
if(WithTile)
{
bdiscontinuity.FlagTails(tail_flags, input, hipcub::Equality(), T(123));
bdiscontinuity.FlagTails(tail_flags, input, equal(), T(123));
} else
{
bdiscontinuity.FlagTails(tail_flags, input, hipcub::Equality());
bdiscontinuity.FlagTails(tail_flags, input, equal());
}

for(unsigned int i = 0; i < ItemsPerThread; i++)
Expand Down Expand Up @@ -144,23 +143,20 @@ struct flag_heads_and_tails
T input[ItemsPerThread];
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockDiscontinuity<T, BlockSize> bdiscontinuity;
bool head_flags[ItemsPerThread];
bool tail_flags[ItemsPerThread];
if(WithTile)
{
bdiscontinuity.FlagHeadsAndTails(head_flags,
T(123),
tail_flags,
T(234),
input,
hipcub::Equality());
} else
bdiscontinuity
.FlagHeadsAndTails(head_flags, T(123), tail_flags, T(234), input, equal());
}
else
{
bdiscontinuity.FlagHeadsAndTails(head_flags, tail_flags, input, hipcub::Equality());
bdiscontinuity.FlagHeadsAndTails(head_flags, tail_flags, input, equal());
}

for(unsigned int i = 0; i < ItemsPerThread; i++)
Expand Down
14 changes: 7 additions & 7 deletions projects/hipcub/benchmark/benchmark_block_exchange.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2026 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -54,7 +54,7 @@ struct blocked_to_striped
T input[ItemsPerThread];
hipcub::LoadDirectBlocked(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand All @@ -78,7 +78,7 @@ struct striped_to_blocked
T input[ItemsPerThread];
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand All @@ -102,7 +102,7 @@ struct blocked_to_warp_striped
T input[ItemsPerThread];
hipcub::LoadDirectBlocked(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand All @@ -126,7 +126,7 @@ struct warp_striped_to_blocked
T input[ItemsPerThread];
hipcub::LoadDirectWarpStriped(lid, d_input + block_offset, input);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand All @@ -152,7 +152,7 @@ struct scatter_to_blocked
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);
hipcub::LoadDirectStriped<BlockSize>(lid, d_ranks + block_offset, ranks);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand All @@ -178,7 +178,7 @@ struct scatter_to_striped
hipcub::LoadDirectStriped<BlockSize>(lid, d_input + block_offset, input);
hipcub::LoadDirectStriped<BlockSize>(lid, d_ranks + block_offset, ranks);

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
hipcub::BlockExchange<T, BlockSize, ItemsPerThread> exchange;
Expand Down
6 changes: 3 additions & 3 deletions projects/hipcub/benchmark/benchmark_block_histogram.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// MIT License
//
// Copyright (c) 2020 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2020-2026 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -64,13 +64,13 @@ struct histogram
__shared__ T histogram[BinSize];
__shared__ typename bhistogram_t::TempStorage storage;

#pragma nounroll
_CCCL_PRAGMA_NOUNROLL()
for(unsigned int trial = 0; trial < Trials; trial++)
{
bhistogram_t(storage).Histogram(values, histogram);
}

#pragma unroll
_CCCL_PRAGMA_UNROLL_FULL()
for(unsigned int offset = 0; offset < BinSize; offset += BlockSize)
{
if(offset + hipThreadIdx_x < BinSize)
Expand Down
Loading