Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python wrappers for c.parallel merge_sort API #3763

Merged
merged 36 commits into from
Feb 19, 2025

Conversation

NaderAlAwar
Copy link
Contributor

Description

Closes #3459.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested a review from a team as a code owner February 10, 2025 19:09
@NaderAlAwar NaderAlAwar requested a review from leofang February 10, 2025 19:09
@NaderAlAwar NaderAlAwar marked this pull request as draft February 10, 2025 19:09
Copy link

copy-pr-bot bot commented Feb 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

🟥 CI finished in 5m 51s: Pass: 0%/1 | Total: 5m 51s | Avg: 5m 51s | Max: 5m 51s
  • 🟥 python: Pass: 0%/1 | Total: 5m 51s | Avg: 5m 51s | Max: 5m 51s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

printf("\nEXCEPTION in cccl_device_merge_sort(): merge sort output cannot be an iterator\n");
fflush(stdout);
error = CUDA_ERROR_UNKNOWN;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3722

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add // See #3722 as a comment? — Super simple but much more discoverable.

case cccl_type_enum::UINT8:
return "::cuda::std::uint8_t";
case cccl_type_enum::UINT16:
return "::cuda::std::uint16_t";
case cccl_type_enum::UINT32:
return "::cuda::std::uint32_t";
case cccl_type_enum::UINT64:
return "::cuda::std::uint64_t";
return "unsigned long";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These had to be changed because creating iterators in python with types np.int64 and np.uint64 were resulting in compilation errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change concerns me, and I am not sure whether it is correct. On Windows, IIRC, long is 32-bit, at least for some targets.

What is the compilation error that was encountered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compilation error occurs when we try to instantiate DeviceMergeSortBlockSortKernel with an iterator:

/home/coder/.local/share/venvs/cccl/lib/python3.12/site-packages/cuda/cccl/include/cub/agent/agent_merge_sort.cuh(206): error: no instance of overloaded function "cub::CUB_300000_SM_890::BlockLoad<T, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z>::Load [with T=input_keys_iterator_state_t::value_type, BLOCK_DIM_X=256, ITEMS_PER_THREAD=2, ALGORITHM=cub::CUB_300000_SM_890::BLOCK_LOAD_WARP_TRANSPOSE, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1]" matches the argument list
            argument types are: (input_keys_iterator_state_t, long [2], int, input_keys_iterator_state_t::value_type)
            object type is: cub::CUB_300000_SM_890::BlockLoad<input_keys_iterator_state_t::value_type, 256, 2, cub::CUB_300000_SM_890::BLOCK_LOAD_WARP_TRANSPOSE, 1, 1>
        BlockLoadKeys(storage.load_keys).Load(keys_in + tile_base, keys_local, num_remaining, *(keys_in + tile_base));

It seems when I pass in an iterator with datatype int64, the value type of the iterator is ::cuda::std::int64_t but KeyT is long.

Looking at this in some more detail, I think the issue comes from a mismatch between cccl_type_enum_to_name and cccl_type_enum_to_string. The former uses nvrtcGetTypeName which returns long when ::cuda::std::int64_t is passed (see https://godbolt.org/z/bqWznEYGP), while the latter just returns the string "::cuda::std::int64_t".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could just replace usages of cccl_type_enum_to_name with the string version, but is there a specific reason why nvrtcGetTypeName would return long? Since they behave differently I'm not sure if having both functions was intentional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change concerns me

Yeah, agree this is very worrisome. I'd try very hard to avoid this change.

(I stared at the error message and types.h, types.cpp for a while but I'm failing to make the connection. Unfortunately I'm without a workstation at the moment and cannot reproduce/troubleshoot the error.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change and combined the two functions into one. I don't think there was any reason to have both functions to begin with and this way we ensure consistency on what types are returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume one problem could be that int64_t could be implemented as either long or long long, which are two different data types but they have the same range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, avoid using long or long long because we want to support Windows soon-ish

// std::sort(expected_items.begin(), expected_items.end());
// REQUIRE(expected_keys == std::vector<TestType>(input_keys_it));
// REQUIRE(expected_items == std::vector<item_t>(input_items_it));
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test that shows output iterators also don't work with items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should either be enabled or removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, this is tracked in #3722

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ralf suggested adding them back but ifdeffing them out, which I think is a good idea too

Copy link
Contributor

🟨 CI finished in 13m 53s: Pass: 66%/3 | Total: 20m 58s | Avg: 6m 59s | Max: 10m 37s | Hits: 98%/296
  • 🟥 python: Pass: 0%/1 | Total: 7m 59s | Avg: 7m 59s | Max: 7m 59s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 59s | Avg: 6m 29s | Max: 10m 37s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 37s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 37s | Avg: 10m 37s | Max: 10m 37s | Hits:  98%/148   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-cpu16

@NaderAlAwar NaderAlAwar requested a review from a team as a code owner February 13, 2025 15:48
@NaderAlAwar NaderAlAwar requested a review from fbusato February 13, 2025 15:48
Copy link
Contributor

🟩 CI finished in 2h 42m: Pass: 100%/93 | Total: 1d 11h | Avg: 23m 12s | Max: 1h 05m | Hits: 94%/134373
  • 🟩 cub: Pass: 100%/45 | Total: 1d 03h | Avg: 37m 04s | Max: 1h 05m | Hits: 93%/53581

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 02h | Avg: 36m 44s | Max:  1h 05m | Hits:  92%/51147 
      🟩 arm64              Pass: 100%/2   | Total:  1h 29m | Avg: 44m 31s | Max: 44m 47s | Hits:  99%/2434  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 26m | Avg: 41m 21s | Max: 54m 21s | Hits:  84%/5919  
      🟩 12.5               Pass: 100%/2   | Total:  1h 19m | Avg: 39m 59s | Max: 40m 31s | Hits:  98%/2252  
      🟩 12.8               Pass: 100%/38  | Total: 23h 01m | Avg: 36m 21s | Max:  1h 05m | Hits:  93%/45410 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 42m | Avg: 51m 22s | Max: 51m 45s | Hits:  99%/2106  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 26m | Avg: 41m 21s | Max: 54m 21s | Hits:  84%/5919  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 19m | Avg: 39m 59s | Max: 40m 31s | Hits:  98%/2252  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 21h 19m | Avg: 35m 31s | Max:  1h 05m | Hits:  93%/43304 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 22s | Max: 51m 45s | Hits:  99%/2106  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 02h | Avg: 36m 24s | Max:  1h 05m | Hits:  92%/51475 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 24m | Avg: 36m 14s | Max: 36m 51s | Hits:  99%/4876  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 11m | Avg: 35m 34s | Max: 35m 53s | Hits:  99%/2434  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 50s | Max: 37m 56s | Hits:  99%/2434  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 58s | Max: 37m 58s | Hits:  99%/2434  
      🟩 Clang18            Pass: 100%/7   | Total:  4h 22m | Avg: 37m 34s | Max: 51m 45s | Hits:  99%/8191  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 19m | Avg: 39m 47s | Max: 40m 04s | Hits:  99%/2438  
      🟩 GCC8               Pass: 100%/1   | Total: 35m 12s | Avg: 35m 12s | Max: 35m 12s | Hits:  99%/1219  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 16m | Avg: 38m 29s | Max: 40m 44s | Hits:  99%/2438  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 43s | Max: 35m 46s | Hits:  99%/2438  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 14m | Avg: 37m 20s | Max: 39m 01s | Hits:  99%/2434  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 15m | Avg: 37m 34s | Max: 39m 09s | Hits:  99%/2434  
      🟩 GCC13              Pass: 100%/11  | Total:  5h 03m | Avg: 27m 37s | Max: 51m 12s | Hits:  99%/13387 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 57s | Max:  1h 03m | Hits:  15%/2086  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  15%/2086  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 19m | Avg: 39m 59s | Max: 40m 31s | Hits:  98%/2252  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 10h 26m | Avg: 36m 51s | Max: 51m 45s | Hits:  99%/20369 
      🟩 GCC                Pass: 100%/22  | Total: 11h 56m | Avg: 32m 35s | Max: 51m 12s | Hits:  99%/26788 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 05m | Avg:  1h 01m | Max:  1h 05m | Hits:  15%/4172  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 59s | Max: 40m 31s | Hits:  98%/2252  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 51m 22s | Avg: 17m 07s | Max: 25m 16s | Hits:  99%/3651  
      🟩 rtx2080            Pass: 100%/34  | Total: 23h 38m | Avg: 41m 42s | Max:  1h 05m | Hits:  90%/40194 
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 19m | Avg: 24m 53s | Max: 37m 52s | Hits:  99%/9736  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 00h | Avg: 40m 25s | Max:  1h 05m | Hits:  91%/43845 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 23s | Avg: 21m 23s | Max: 21m 23s | Hits:  99%/1217  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 02s | Avg: 16m 02s | Max: 16m 02s | Hits:  99%/1217  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 03s | Max: 25m 16s | Hits:  99%/3651  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 03m | Avg: 21m 11s | Max: 22m 19s | Hits:  99%/3651  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 51m 22s | Avg: 17m 07s | Max: 25m 16s | Hits:  99%/3651  
      🟩 90;90a;100         Pass: 100%/1   | Total: 51m 12s | Avg: 51m 12s | Max: 51m 12s | Hits:  99%/1217  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 13h 51m | Avg: 41m 33s | Max:  1h 05m | Hits:  88%/23579 
      🟩 20                 Pass: 100%/25  | Total: 13h 57m | Avg: 33m 29s | Max:  1h 01m | Hits:  96%/30002 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 7h 21m | Avg: 9m 48s | Max: 33m 56s | Hits: 95%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 50s | Avg:  8m 25s | Max: 11m 04s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  7h 11m | Avg: 10m 02s | Max: 33m 56s | Hits:  95%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 34s | Avg:  4m 47s | Max:  5m 10s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 56m 20s | Avg: 11m 16s | Max: 29m 22s | Hits:  92%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 29m 21s | Avg: 14m 40s | Max: 15m 34s | Hits:  98%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 55m | Avg:  9m 21s | Max: 33m 56s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 20s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 56m 20s | Avg: 11m 16s | Max: 29m 22s | Hits:  92%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 21s | Avg: 14m 40s | Max: 15m 34s | Hits:  98%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 44m | Avg:  9m 34s | Max: 33m 56s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 20s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  7h 10m | Avg: 10m 00s | Max: 33m 56s | Hits:  95%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 23s | Avg:  5m 05s | Max:  5m 24s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 49s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 46s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 39s | Avg:  5m 49s | Max:  5m 52s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 52s | Avg:  6m 16s | Max: 10m 12s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 17m 35s | Avg:  8m 47s | Max: 12m 01s | Hits:  97%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 17m 57s | Avg:  8m 58s | Max: 12m 39s | Hits:  97%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  5m 58s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 31s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 20s | Avg:  5m 40s | Max:  5m 42s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 22m | Avg:  8m 16s | Max: 12m 25s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 59m 57s | Avg: 29m 58s | Max: 30m 35s | Hits:  67%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 35m | Avg: 31m 40s | Max: 33m 56s | Hits:  68%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 21s | Avg: 14m 40s | Max: 15m 34s | Hits:  98%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 38m | Avg:  5m 48s | Max: 10m 12s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 37m | Avg:  7m 31s | Max: 12m 39s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 34m | Avg: 30m 59s | Max: 33m 56s | Hits:  67%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 21s | Avg: 14m 40s | Max: 15m 34s | Hits:  98%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 12s | Avg:  8m 06s | Max: 11m 17s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 52m | Avg:  8m 52s | Max: 30m 35s | Hits:  96%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 12m | Avg: 13m 13s | Max: 33m 56s | Hits:  93%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 49m | Avg:  9m 11s | Max: 33m 56s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 45s | Avg: 15m 55s | Max: 32m 17s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 48s | Avg: 10m 57s | Max: 11m 17s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 12s | Avg:  8m 06s | Max: 11m 17s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 39s | Avg:  6m 39s | Max:  6m 39s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 31m | Avg: 10m 34s | Max: 30m 35s | Hits:  94%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 32m | Avg:  9m 14s | Max: 33m 56s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 08s | Avg: 6m 34s | Max: 10m 43s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 08s | Avg:  6m 34s | Max: 10m 43s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 25s | Avg:  2m 25s | Max:  2m 25s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 43s | Avg: 10m 43s | Max: 10m 43s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 50s | Avg: 34m 50s | Max: 34m 50s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

🟩 CI finished in 1h 08m: Pass: 100%/93 | Total: 15h 53m | Avg: 10m 14s | Max: 34m 02s | Hits: 95%/134373
  • 🟩 cub: Pass: 100%/45 | Total: 8h 32m | Avg: 11m 23s | Max: 33m 38s | Hits: 93%/53581

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 21m | Avg: 11m 40s | Max: 33m 38s | Hits:  92%/51147 
      🟩 arm64              Pass: 100%/2   | Total: 10m 43s | Avg:  5m 21s | Max:  5m 41s | Hits:  99%/2434  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 51m 40s | Avg: 10m 20s | Max: 29m 38s | Hits:  85%/5919  
      🟩 12.5               Pass: 100%/2   | Total: 20m 54s | Avg: 10m 27s | Max: 10m 43s | Hits:  98%/2252  
      🟩 12.8               Pass: 100%/38  | Total:  7h 20m | Avg: 11m 34s | Max: 33m 38s | Hits:  94%/45410 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  5m 11s | Hits: 100%/2106  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 51m 40s | Avg: 10m 20s | Max: 29m 38s | Hits:  85%/5919  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 54s | Avg: 10m 27s | Max: 10m 43s | Hits:  98%/2252  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  7h 10m | Avg: 11m 56s | Max: 33m 38s | Hits:  93%/43304 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  5m 11s | Hits: 100%/2106  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 22m | Avg: 11m 41s | Max: 33m 38s | Hits:  92%/51475 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 11s | Avg:  5m 47s | Max:  6m 23s | Hits: 100%/4876  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 02s | Hits: 100%/2434  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 14s | Hits: 100%/2434  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 13s | Hits: 100%/2434  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 09m | Avg:  9m 57s | Max: 23m 23s | Hits: 100%/8191  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  6m 14s | Hits:  99%/2438  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 16s | Avg:  6m 16s | Max:  6m 16s | Hits:  99%/1219  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 27s | Avg:  6m 13s | Max:  6m 37s | Hits:  99%/2438  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 45s | Avg:  6m 22s | Max:  6m 35s | Hits:  99%/2438  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 20s | Avg:  6m 40s | Max:  6m 41s | Hits:  99%/2434  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 43s | Avg:  6m 51s | Max:  6m 53s | Hits:  99%/2434  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 48m | Avg: 15m 19s | Max: 26m 59s | Hits:  99%/13387 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 17s | Max: 30m 57s | Hits:  16%/2086  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 30s | Max: 33m 38s | Hits:  16%/2086  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 54s | Avg: 10m 27s | Max: 10m 43s | Hits:  98%/2252  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 09m | Avg:  7m 36s | Max: 23m 23s | Hits: 100%/20369 
      🟩 GCC                Pass: 100%/22  | Total:  3h 58m | Avg: 10m 51s | Max: 26m 59s | Hits:  99%/26788 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 03m | Avg: 30m 54s | Max: 33m 38s | Hits:  16%/4172  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 54s | Avg: 10m 27s | Max: 10m 43s | Hits:  98%/2252  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 53m 17s | Avg: 17m 45s | Max: 24m 36s | Hits:  99%/3651  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 14m | Avg:  9m 15s | Max: 33m 38s | Hits:  91%/40194 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 24m | Avg: 18m 03s | Max: 26m 59s | Hits:  99%/9736  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 33m | Avg:  9m 00s | Max: 33m 38s | Hits:  91%/43845 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 35s | Avg: 21m 35s | Max: 21m 35s | Hits:  99%/1217  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 18s | Avg: 16m 18s | Max: 16m 18s | Hits:  99%/1217  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 59s | Max: 26m 59s | Hits:  99%/3651  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 15s | Max: 23m 51s | Hits:  99%/3651  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 53m 17s | Avg: 17m 45s | Max: 24m 36s | Hits:  99%/3651  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 03s | Avg:  7m 03s | Max:  7m 03s | Hits:  99%/1217  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 18m | Avg:  9m 55s | Max: 30m 57s | Hits:  88%/23579 
      🟩 20                 Pass: 100%/25  | Total:  5h 14m | Avg: 12m 34s | Max: 33m 38s | Hits:  96%/30002 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 33m | Avg: 8m 44s | Max: 32m 38s | Hits: 96%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 56s | Avg:  8m 28s | Max: 11m 01s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 24m | Avg:  8m 55s | Max: 32m 38s | Hits:  96%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 27s | Avg:  4m 43s | Max:  5m 01s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 42m 10s | Avg:  8m 26s | Max: 22m 54s | Hits:  94%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 28m 09s | Avg: 14m 04s | Max: 14m 45s | Hits:  99%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 23m | Avg:  8m 30s | Max: 32m 38s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  4m 53s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 42m 10s | Avg:  8m 26s | Max: 22m 54s | Hits:  94%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 09s | Avg: 14m 04s | Max: 14m 45s | Hits:  99%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 13m | Avg:  8m 42s | Max: 32m 38s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  4m 53s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 23m | Avg:  8m 55s | Max: 32m 38s | Hits:  96%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 25s | Avg:  5m 06s | Max:  5m 30s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  5m 50s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 21s | Avg:  5m 40s | Max:  5m 49s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 10s | Avg:  5m 35s | Max:  5m 48s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 42m 25s | Avg:  6m 03s | Max: 10m 13s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 25s | Avg:  5m 12s | Max:  5m 33s | Hits:  99%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 08s | Avg:  5m 08s | Max:  5m 08s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  5m 22s | Hits:  99%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 42s | Avg:  5m 51s | Max:  5m 56s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 18s | Avg:  5m 39s | Max:  5m 40s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 15s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 16m | Avg:  7m 38s | Max: 11m 17s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 38s | Avg: 23m 49s | Max: 24m 44s | Hits:  70%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 51s | Max: 32m 38s | Hits:  70%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 09s | Avg: 14m 04s | Max: 14m 45s | Hits:  99%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 36m | Avg:  5m 41s | Max: 10m 13s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 17m | Avg:  6m 32s | Max: 11m 17s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 11m | Avg: 26m 14s | Max: 32m 38s | Hits:  70%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 09s | Avg: 14m 04s | Max: 14m 45s | Hits:  99%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 15m 46s | Avg:  7m 53s | Max: 11m 10s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 13m | Avg:  7m 40s | Max: 25m 05s | Hits:  97%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 04m | Avg: 12m 27s | Max: 32m 38s | Hits:  94%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 01m | Avg:  7m 56s | Max: 25m 51s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 18s | Avg: 16m 06s | Max: 32m 38s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 41s | Avg: 10m 55s | Max: 11m 17s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 15m 46s | Avg:  7m 53s | Max: 11m 10s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 32s | Avg:  6m 32s | Max:  6m 32s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 53m | Avg:  8m 39s | Max: 25m 05s | Hits:  95%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 23m | Avg:  8m 50s | Max: 32m 38s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 02s | Avg: 6m 31s | Max: 10m 38s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max: 10m 38s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 24s | Avg:  2m 24s | Max:  2m 24s | Hits:  97%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 38s | Avg: 10m 38s | Max: 10m 38s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 02s | Avg: 34m 02s | Max: 34m 02s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

if (cccl_iterator_kind_t::iterator == d_out_keys.type || cccl_iterator_kind_t::iterator == d_out_items.type)
{
fflush(stderr);
printf("\nEXCEPTION in cccl_device_merge_sort(): merge sort output cannot be an iterator\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXCEPTION was meant to signal that an exception was caught. (I started this not very good pattern...)

Could you please change this to ERROR?

fflush(stderr);
printf("\nEXCEPTION in cccl_device_merge_sort(): merge sort output cannot be an iterator\n");
fflush(stdout);
error = CUDA_ERROR_UNKNOWN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to return here?

    return CUDA_ERROR_UNKNOWN;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did, fixed

printf("\nEXCEPTION in cccl_device_merge_sort(): merge sort output cannot be an iterator\n");
fflush(stdout);
error = CUDA_ERROR_UNKNOWN;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add // See #3722 as a comment? — Super simple but much more discoverable.

@@ -209,6 +222,20 @@ def _iterator_to_cccl_iter(it: IteratorBase) -> Iterator:
)


def _none_to_cccl_iter() -> Iterator:
# Create a null int pointer. Any type could be used here, we just need to pass NULL.
info = _numpy_type_to_info(np.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use np.void here, to make the intent more clear? If not, my second pick would be np.char.

(This isn't a big deal, but I'd definitely try to make the intent more clear. E.g. seeing size 1 or 0 while debugging some issue in the core code later is a nice clue in that far-removed-from-here context.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.void didn't work since this calls numba.from_dtype() internally, so I ended up using np.uint8

from ..typing import DeviceArrayLike


def _dtype_validation(dt1, dt2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwina's PR #3718 removed this for reduce to resolve performance issues. I'm guessing we'll have the same issues here. Probably it's best to omit this dtype validation from the start for merge_sort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and also updated the code to follow the new patterns more closely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see the _dtype_validation() function here, although it isn't called anymore. Oversight?

@@ -254,36 +254,6 @@ TEST_CASE("DeviceMergeSort::SortKeys works with input iterators", "[merge_sort]"
REQUIRE(expected_keys == std::vector<TestType>(input_keys_ptr));
}

// TEST_CASE("DeviceMergeSort::SortKeys works with output iterators", "[merge_sort]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ideas:

  1. Definitely add a comment to explain why the code is commented out instead of being removed entirely.
  2. Maybe use #ifdef NEVER_DEFINED instead of comments, then at least you'll still get the syntax highlighting, and potentially other tooling still see this as C++ code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I added them back

case cccl_type_enum::UINT8:
return "::cuda::std::uint8_t";
case cccl_type_enum::UINT16:
return "::cuda::std::uint16_t";
case cccl_type_enum::UINT32:
return "::cuda::std::uint32_t";
case cccl_type_enum::UINT64:
return "::cuda::std::uint64_t";
return "unsigned long";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change concerns me

Yeah, agree this is very worrisome. I'd try very hard to avoid this change.

(I stared at the error message and types.h, types.cpp for a while but I'm failing to make the connection. Unfortunately I'm without a workstation at the moment and cannot reproduce/troubleshoot the error.)

Copy link
Contributor

🟩 CI finished in 1h 03m: Pass: 100%/93 | Total: 15h 45m | Avg: 10m 09s | Max: 34m 45s | Hits: 95%/134371
  • 🟩 cub: Pass: 100%/45 | Total: 8h 24m | Avg: 11m 12s | Max: 34m 35s | Hits: 93%/53581

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 13m | Avg: 11m 28s | Max: 34m 35s | Hits:  92%/51147 
      🟩 arm64              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 41s | Hits:  99%/2434  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 52m 50s | Avg: 10m 34s | Max: 29m 34s | Hits:  85%/5919  
      🟩 12.5               Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 10m 12s | Hits:  98%/2252  
      🟩 12.8               Pass: 100%/38  | Total:  7h 11m | Avg: 11m 20s | Max: 34m 35s | Hits:  94%/45410 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 52s | Hits: 100%/2106  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 52m 50s | Avg: 10m 34s | Max: 29m 34s | Hits:  85%/5919  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 10m 12s | Hits:  98%/2252  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  7h 01m | Avg: 11m 42s | Max: 34m 35s | Hits:  93%/43304 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 52s | Hits: 100%/2106  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 14m | Avg: 11m 30s | Max: 34m 35s | Hits:  92%/51475 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 01s | Avg:  5m 45s | Max:  5m 57s | Hits: 100%/4876  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 22s | Avg:  6m 11s | Max:  6m 11s | Hits: 100%/2434  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 31s | Avg:  6m 15s | Max:  6m 23s | Hits: 100%/2434  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 26s | Hits: 100%/2434  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 10m | Avg: 10m 02s | Max: 23m 22s | Hits: 100%/8191  
      🟩 GCC7               Pass: 100%/2   | Total: 12m 27s | Avg:  6m 13s | Max:  6m 26s | Hits:  99%/2438  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 08s | Avg:  6m 08s | Max:  6m 08s | Hits:  99%/1219  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 44s | Avg:  6m 22s | Max:  6m 40s | Hits:  99%/2438  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max:  6m 31s | Hits:  99%/2438  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 56s | Avg:  6m 28s | Max:  6m 33s | Hits:  99%/2434  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 39s | Avg:  6m 49s | Max:  6m 50s | Hits:  99%/2434  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 39m | Avg: 14m 28s | Max: 25m 17s | Hits:  99%/13387 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 58m 35s | Avg: 29m 17s | Max: 29m 34s | Hits:  16%/2086  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 07s | Max: 34m 35s | Hits:  16%/2086  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 10m 12s | Hits:  98%/2252  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 10m | Avg:  7m 42s | Max: 23m 22s | Hits: 100%/20369 
      🟩 GCC                Pass: 100%/22  | Total:  3h 50m | Avg: 10m 27s | Max: 25m 17s | Hits:  99%/26788 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 02m | Avg: 30m 42s | Max: 34m 35s | Hits:  16%/4172  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 10m 12s | Hits:  98%/2252  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 51m 00s | Avg: 17m 00s | Max: 25m 17s | Hits:  99%/3651  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 14m | Avg:  9m 15s | Max: 34m 35s | Hits:  91%/40194 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 18m | Avg: 17m 17s | Max: 23m 37s | Hits:  99%/9736  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 33m | Avg:  9m 00s | Max: 34m 35s | Hits:  91%/43845 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 46s | Avg: 20m 46s | Max: 20m 46s | Hits:  99%/1217  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 15s | Avg: 16m 15s | Max: 16m 15s | Hits:  99%/1217  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 05s | Max: 25m 17s | Hits:  99%/3651  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 01m | Avg: 20m 37s | Max: 21m 09s | Hits:  99%/3651  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 51m 00s | Avg: 17m 00s | Max: 25m 17s | Hits:  99%/3651  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 01s | Avg:  7m 01s | Max:  7m 01s | Hits:  99%/1217  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 17m | Avg:  9m 52s | Max: 29m 40s | Hits:  88%/23579 
      🟩 20                 Pass: 100%/25  | Total:  5h 06m | Avg: 12m 16s | Max: 34m 35s | Hits:  96%/30002 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 32m | Avg: 8m 43s | Max: 33m 36s | Hits: 96%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 58s | Avg:  8m 29s | Max: 11m 03s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 23m | Avg:  8m 54s | Max: 33m 36s | Hits:  96%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  5m 00s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 42m 31s | Avg:  8m 30s | Max: 23m 32s | Hits:  94%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 14m 50s | Hits:  99%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 20m | Avg:  8m 26s | Max: 33m 36s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 19s | Avg:  5m 09s | Max:  5m 22s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 42m 31s | Avg:  8m 30s | Max: 23m 32s | Hits:  94%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 14m 50s | Hits:  99%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 10m | Avg:  8m 37s | Max: 33m 36s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 19s | Avg:  5m 09s | Max:  5m 22s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 22m | Avg:  8m 53s | Max: 33m 36s | Hits:  96%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 07s | Avg:  5m 01s | Max:  5m 45s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 32s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 36s | Avg:  5m 18s | Max:  5m 23s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  5m 52s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 53s | Avg:  6m 16s | Max: 10m 13s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 31s | Hits:  99%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 05s | Avg:  5m 05s | Max:  5m 05s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  5m 29s | Hits:  99%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 20s | Avg:  5m 40s | Max:  5m 41s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 42s | Avg:  5m 51s | Max:  5m 53s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 55s | Avg:  5m 57s | Max:  6m 18s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 14m | Avg:  7m 28s | Max: 11m 21s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 48m 04s | Avg: 24m 02s | Max: 24m 32s | Hits:  70%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 31s | Max: 33m 36s | Hits:  70%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 14m 50s | Hits:  99%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 37m | Avg:  5m 42s | Max: 10m 13s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 15m | Avg:  6m 27s | Max: 11m 21s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 10m | Avg: 26m 07s | Max: 33m 36s | Hits:  70%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 14m 50s | Hits:  99%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 15m 56s | Avg:  7m 58s | Max: 11m 21s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 11m | Avg:  7m 37s | Max: 24m 32s | Hits:  97%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 04m | Avg: 12m 28s | Max: 33m 36s | Hits:  94%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 59m | Avg:  7m 52s | Max: 25m 36s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 16s | Avg: 16m 25s | Max: 33m 36s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 58s | Avg: 10m 59s | Max: 11m 21s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 15m 56s | Avg:  7m 58s | Max: 11m 21s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 16s | Avg:  6m 16s | Max:  6m 16s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 52m | Avg:  8m 37s | Max: 24m 32s | Hits:  95%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 22m | Avg:  8m 49s | Max: 33m 36s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 33s | Avg: 6m 46s | Max: 10m 57s | Hits: 96%/294

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max: 10m 57s | Hits:  96%/294   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s | Hits:  93%/147   
      🟩 Test               Pass: 100%/1   | Total: 10m 57s | Avg: 10m 57s | Max: 10m 57s | Hits:  98%/147   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 45s | Avg: 34m 45s | Max: 34m 45s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

{
case cccl_type_enum::INT8:

check(nvrtcGetTypeName<::cuda::std::int8_t*>(&result));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nader, your change looks good to me, especially because it passes all tests.

However, it does change the behavior, therefore it might be good to ask @gevtushenko (original author) specifically:

This approach

        check(nvrtcGetTypeName<::cuda::std::int8_t*>(&result));

was introduced with PR #2256 (in c/src/reduce.cu at the time):

https://github.com/NVIDIA/cccl/pull/2256/files#diff-2a8594900c7245f7952fc68f69ca42c59a016e5f8555d126b8f9168e421289a4R72-R187

What was the original intent?

I'm totally speculating here:

Do we maybe want to keep both functions, maybe rename this one to cccl_type_enum_to_nvrtc_type_name(), and adding the bool is_pointer = false argument to the other one? (But then again, all tests pass...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with Georgii and he said that having two functions was an oversight on his part, and that this change is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@NaderAlAwar NaderAlAwar requested a review from rwgk February 19, 2025 00:58
Copy link
Contributor

🟩 CI finished in 1h 09m: Pass: 100%/93 | Total: 15h 50m | Avg: 10m 13s | Max: 35m 20s | Hits: 95%/134371
  • 🟩 cub: Pass: 100%/45 | Total: 8h 27m | Avg: 11m 17s | Max: 35m 20s | Hits: 93%/53581

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 17m | Avg: 11m 33s | Max: 35m 20s | Hits:  92%/51147 
      🟩 arm64              Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 40s | Hits:  99%/2434  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 52m 22s | Avg: 10m 28s | Max: 29m 38s | Hits:  84%/5919  
      🟩 12.5               Pass: 100%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 10m 40s | Hits:  98%/2252  
      🟩 12.8               Pass: 100%/38  | Total:  7h 14m | Avg: 11m 25s | Max: 35m 20s | Hits:  93%/45410 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 10s | Hits:  99%/2106  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 52m 22s | Avg: 10m 28s | Max: 29m 38s | Hits:  84%/5919  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 10m 40s | Hits:  98%/2252  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  7h 04m | Avg: 11m 46s | Max: 35m 20s | Hits:  93%/43304 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 10s | Hits:  99%/2106  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 17m | Avg: 11m 34s | Max: 35m 20s | Hits:  92%/51475 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 43s | Avg:  5m 55s | Max:  6m 31s | Hits:  99%/4876  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max:  6m 24s | Hits:  99%/2434  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 22s | Hits:  99%/2434  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 21s | Avg:  6m 10s | Max:  6m 15s | Hits:  99%/2434  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 10s | Max: 23m 00s | Hits:  99%/8191  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 32s | Avg:  5m 46s | Max:  6m 13s | Hits:  99%/2438  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s | Hits:  99%/1219  
      🟩 GCC9               Pass: 100%/2   | Total: 13m 46s | Avg:  6m 53s | Max:  7m 17s | Hits:  99%/2438  
      🟩 GCC10              Pass: 100%/2   | Total: 13m 37s | Avg:  6m 48s | Max:  6m 53s | Hits:  99%/2438  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 06s | Avg:  6m 33s | Max:  6m 44s | Hits:  99%/2434  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 42s | Avg:  6m 51s | Max:  7m 04s | Hits:  99%/2434  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 39m | Avg: 14m 27s | Max: 24m 26s | Hits:  99%/13387 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 58s | Avg: 28m 29s | Max: 29m 38s | Hits:  15%/2086  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 06m | Avg: 33m 09s | Max: 35m 20s | Hits:  15%/2086  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 10m 40s | Hits:  98%/2252  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 12m | Avg:  7m 46s | Max: 23m 00s | Hits:  99%/20369 
      🟩 GCC                Pass: 100%/22  | Total:  3h 51m | Avg: 10m 30s | Max: 24m 26s | Hits:  99%/26788 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 03m | Avg: 30m 49s | Max: 35m 20s | Hits:  15%/4172  
      🟩 NVHPC              Pass: 100%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 10m 40s | Hits:  98%/2252  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 50m 38s | Avg: 16m 52s | Max: 24m 26s | Hits:  99%/3651  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 17m | Avg:  9m 20s | Max: 35m 20s | Hits:  90%/40194 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 19m | Avg: 17m 26s | Max: 24m 08s | Hits:  99%/9736  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 36m | Avg:  9m 05s | Max: 35m 20s | Hits:  91%/43845 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 18s | Avg: 21m 18s | Max: 21m 18s | Hits:  99%/1217  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 23s | Avg: 16m 23s | Max: 16m 23s | Hits:  99%/1217  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 51s | Max: 24m 26s | Hits:  99%/3651  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 52s | Max: 21m 04s | Hits:  99%/3651  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 50m 38s | Avg: 16m 52s | Max: 24m 26s | Hits:  99%/3651  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s | Hits:  99%/1217  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 18m | Avg:  9m 55s | Max: 30m 59s | Hits:  88%/23579 
      🟩 20                 Pass: 100%/25  | Total:  5h 09m | Avg: 12m 22s | Max: 35m 20s | Hits:  96%/30002 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 35m | Avg: 8m 47s | Max: 34m 41s | Hits: 96%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 11s | Avg:  8m 35s | Max: 11m 02s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 25m | Avg:  8m 58s | Max: 34m 41s | Hits:  96%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  5m 07s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 43m 34s | Avg:  8m 42s | Max: 22m 57s | Hits:  94%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 30m 17s | Avg: 15m 08s | Max: 15m 28s | Hits:  99%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 21m | Avg:  8m 27s | Max: 34m 41s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 39s | Avg:  5m 19s | Max:  5m 21s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 43m 34s | Avg:  8m 42s | Max: 22m 57s | Hits:  94%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 17s | Avg: 15m 08s | Max: 15m 28s | Hits:  99%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 11m | Avg:  8m 38s | Max: 34m 41s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 39s | Avg:  5m 19s | Max:  5m 21s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 24m | Avg:  8m 57s | Max: 34m 41s | Hits:  96%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 36s | Avg:  5m 09s | Max:  5m 34s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  5m 46s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 28s | Avg:  5m 44s | Max:  5m 48s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 41s | Avg:  5m 20s | Max:  5m 23s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 32s | Avg:  6m 13s | Max: 10m 08s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 30s | Avg:  5m 15s | Max:  5m 17s | Hits:  99%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 54s | Avg:  5m 27s | Max:  5m 28s | Hits:  99%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 51s | Avg:  5m 55s | Max:  6m 00s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 53s | Avg:  5m 56s | Max:  6m 08s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 42s | Avg:  5m 51s | Max:  5m 59s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 35s | Max: 11m 18s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 08s | Avg: 23m 04s | Max: 23m 11s | Hits:  70%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 43s | Max: 34m 41s | Hits:  70%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 17s | Avg: 15m 08s | Max: 15m 28s | Hits:  99%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 37m | Avg:  5m 44s | Max: 10m 08s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 18m | Avg:  6m 35s | Max: 11m 18s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 09m | Avg: 25m 51s | Max: 34m 41s | Hits:  70%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 17s | Avg: 15m 08s | Max: 15m 28s | Hits:  99%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 15m 57s | Avg:  7m 58s | Max: 11m 14s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 13m | Avg:  7m 41s | Max: 23m 11s | Hits:  97%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 05m | Avg: 12m 35s | Max: 34m 41s | Hits:  94%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 01m | Avg:  7m 56s | Max: 25m 49s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 56s | Avg: 16m 38s | Max: 34m 41s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 42s | Avg: 10m 55s | Max: 11m 18s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 15m 57s | Avg:  7m 58s | Max: 11m 14s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 42s | Avg:  6m 42s | Max:  6m 42s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 52m | Avg:  8m 36s | Max: 23m 11s | Hits:  95%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 26m | Avg:  8m 58s | Max: 34m 41s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 58s | Avg: 6m 29s | Max: 10m 36s | Hits: 98%/294

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 58s | Avg:  6m 29s | Max: 10m 36s | Hits:  98%/294   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s | Hits:  98%/147   
      🟩 Test               Pass: 100%/1   | Total: 10m 36s | Avg: 10m 36s | Max: 10m 36s | Hits:  98%/147   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 22s | Avg: 34m 22s | Max: 34m 22s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

from ..typing import DeviceArrayLike


def _dtype_validation(dt1, dt2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see the _dtype_validation() function here, although it isn't called anymore. Oversight?

Copy link
Contributor

🟩 CI finished in 1h 02m: Pass: 100%/93 | Total: 15h 34m | Avg: 10m 03s | Max: 34m 33s | Hits: 95%/134371
  • 🟩 cub: Pass: 100%/45 | Total: 8h 18m | Avg: 11m 04s | Max: 30m 44s | Hits: 93%/53581

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 07m | Avg: 11m 20s | Max: 30m 44s | Hits:  92%/51147 
      🟩 arm64              Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 36s | Hits:  99%/2434  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 50m 05s | Avg: 10m 01s | Max: 27m 42s | Hits:  85%/5919  
      🟩 12.5               Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 29s | Hits:  98%/2252  
      🟩 12.8               Pass: 100%/38  | Total:  7h 07m | Avg: 11m 15s | Max: 30m 44s | Hits:  94%/45410 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 51s | Hits: 100%/2106  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 50m 05s | Avg: 10m 01s | Max: 27m 42s | Hits:  85%/5919  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 29s | Hits:  98%/2252  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  6h 58m | Avg: 11m 36s | Max: 30m 44s | Hits:  93%/43304 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 51s | Hits: 100%/2106  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 08m | Avg: 11m 22s | Max: 30m 44s | Hits:  92%/51475 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 31s | Avg:  5m 37s | Max:  5m 54s | Hits: 100%/4876  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 31s | Avg:  6m 15s | Max:  6m 35s | Hits: 100%/2434  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 26s | Hits: 100%/2434  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 08s | Avg:  6m 04s | Max:  6m 05s | Hits: 100%/2434  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 09s | Max: 22m 28s | Hits: 100%/8191  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 40s | Avg:  5m 50s | Max:  5m 55s | Hits:  99%/2438  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s | Hits:  99%/1219  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 16s | Hits:  99%/2438  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 19s | Hits:  99%/2438  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 21s | Hits:  99%/2434  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 43s | Avg:  6m 51s | Max:  7m 00s | Hits:  99%/2434  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 43m | Avg: 14m 51s | Max: 24m 17s | Hits:  99%/13387 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 22s | Avg: 28m 11s | Max: 28m 40s | Hits:  16%/2086  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 58m 41s | Avg: 29m 20s | Max: 30m 44s | Hits:  16%/2086  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 29s | Hits:  98%/2252  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 10m | Avg:  7m 42s | Max: 22m 28s | Hits: 100%/20369 
      🟩 GCC                Pass: 100%/22  | Total:  3h 51m | Avg: 10m 32s | Max: 24m 17s | Hits:  99%/26788 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 55m | Avg: 28m 45s | Max: 30m 44s | Hits:  16%/4172  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 29s | Hits:  98%/2252  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 52m 23s | Avg: 17m 27s | Max: 24m 17s | Hits:  99%/3651  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 03m | Avg:  8m 56s | Max: 30m 44s | Hits:  91%/40194 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 22m | Avg: 17m 46s | Max: 23m 58s | Hits:  99%/9736  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 21m | Avg:  8m 41s | Max: 30m 44s | Hits:  91%/43845 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 08s | Avg: 21m 08s | Max: 21m 08s | Hits:  99%/1217  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 06s | Avg: 16m 06s | Max: 16m 06s | Hits:  99%/1217  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 34s | Max: 24m 17s | Hits:  99%/3651  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 08m | Avg: 22m 55s | Max: 23m 55s | Hits:  99%/3651  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 52m 23s | Avg: 17m 27s | Max: 24m 17s | Hits:  99%/3651  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 33s | Avg:  6m 33s | Max:  6m 33s | Hits:  99%/1217  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 11m | Avg:  9m 35s | Max: 28m 40s | Hits:  88%/23579 
      🟩 20                 Pass: 100%/25  | Total:  5h 06m | Avg: 12m 16s | Max: 30m 44s | Hits:  96%/30002 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 28m | Avg: 8m 38s | Max: 29m 32s | Hits: 96%/80496

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 42s | Avg:  8m 21s | Max: 11m 03s | Hits:  99%/3580  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 19m | Avg:  8m 49s | Max: 29m 32s | Hits:  96%/76917 
      🟩 arm64              Pass: 100%/2   | Total:  9m 30s | Avg:  4m 45s | Max:  5m 03s | Hits:  99%/3579  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 41m 45s | Avg:  8m 21s | Max: 21m 42s | Hits:  94%/8941  
      🟩 12.5               Pass: 100%/2   | Total: 27m 50s | Avg: 13m 55s | Max: 13m 57s | Hits:  99%/3578  
      🟩 12.8               Pass: 100%/38  | Total:  5h 19m | Avg:  8m 23s | Max: 29m 32s | Hits:  96%/67977 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  5m 24s | Hits: 100%/3578  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 41m 45s | Avg:  8m 21s | Max: 21m 42s | Hits:  94%/8941  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 27m 50s | Avg: 13m 55s | Max: 13m 57s | Hits:  99%/3578  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 08m | Avg:  8m 34s | Max: 29m 32s | Hits:  96%/64399 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 18s | Avg:  5m 09s | Max:  5m 24s | Hits: 100%/3578  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 18m | Avg:  8m 48s | Max: 29m 32s | Hits:  96%/76918 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 37s | Avg:  5m 09s | Max:  5m 49s | Hits: 100%/7156  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  5m 22s | Hits: 100%/3578  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 35s | Hits: 100%/3578  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 51s | Avg:  5m 25s | Max:  5m 27s | Hits: 100%/3578  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 21s | Avg:  6m 11s | Max: 10m 12s | Hits: 100%/12523 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  5m 25s | Hits:  99%/3580  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s | Hits:  99%/1790  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  5m 47s | Hits:  99%/3580  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  5m 27s | Hits:  99%/3580  
      🟩 GCC11              Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 25s | Hits:  99%/3580  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 13s | Avg:  6m 06s | Max:  6m 26s | Hits:  99%/3580  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 16m | Avg:  7m 36s | Max: 11m 47s | Hits:  99%/17900 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 45m 48s | Avg: 22m 54s | Max: 24m 06s | Hits:  70%/3566  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 21m | Avg: 27m 05s | Max: 29m 32s | Hits:  70%/5349  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 27m 50s | Avg: 13m 55s | Max: 13m 57s | Hits:  99%/3578  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 36m | Avg:  5m 40s | Max: 10m 12s | Hits: 100%/30413 
      🟩 GCC                Pass: 100%/21  | Total:  2h 17m | Avg:  6m 32s | Max: 11m 47s | Hits:  99%/37590 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 07m | Avg: 25m 25s | Max: 29m 32s | Hits:  70%/8915  
      🟩 NVHPC              Pass: 100%/2   | Total: 27m 50s | Avg: 13m 55s | Max: 13m 57s | Hits:  99%/3578  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 42s | Avg:  8m 21s | Max: 11m 47s | Hits:  99%/3580  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 10m | Avg:  7m 35s | Max: 25m 10s | Hits:  97%/59033 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 01m | Avg: 12m 09s | Max: 29m 32s | Hits:  94%/17883 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 59m | Avg:  7m 52s | Max: 26m 35s | Hits:  96%/67975 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 02s | Avg: 15m 00s | Max: 29m 32s | Hits:  90%/5362  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 18s | Avg: 11m 04s | Max: 11m 47s | Hits:  99%/7159  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 42s | Avg:  8m 21s | Max: 11m 47s | Hits:  99%/3580  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 32s | Avg:  6m 32s | Max:  6m 32s | Hits:  99%/1790  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 52m | Avg:  8m 38s | Max: 25m 10s | Hits:  95%/35771 
      🟩 20                 Pass: 100%/23  | Total:  3h 19m | Avg:  8m 39s | Max: 29m 32s | Hits:  97%/41145 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 00s | Avg: 6m 30s | Max: 10m 41s | Hits: 98%/294

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max: 10m 41s | Hits:  98%/294   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s | Hits:  98%/147   
      🟩 Test               Pass: 100%/1   | Total: 10m 41s | Avg: 10m 41s | Max: 10m 41s | Hits:  98%/147   
    
  • 🟩 python: Pass: 100%/1 | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 34m 33s | Avg: 34m 33s | Max: 34m 33s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@NaderAlAwar NaderAlAwar merged commit d7a1d6a into NVIDIA:main Feb 19, 2025
108 of 110 checks passed
davebayer pushed a commit to davebayer/cccl that referenced this pull request Feb 20, 2025
* Fix issue with converting types to strings in c.parallel merge_sort

* Add option to specify prefix for iterator methods to avoid name collisions

* Return error if output iterators are passed to c.parallel merge_sort

* Use `launcher_factory.PtxVersion()` in dispatch merge sort due to cudaErrorUnsupportedPtxVersion error

* Remove `cccl_type_enum_to_string` and replace with `cccl_type_enum_to_name` due to inconsistencies with the datatype being returned for INT64 and UINT64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add Python wrappers for merge_sort
4 participants