Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python wrappers for c.parallel unique_by_key API #3949

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

NaderAlAwar
Copy link
Contributor

Description

Closes #3808

This should only be merged after #3947

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

…because this allows us to define it such that we can use the ActivePolicyT template parameter of invoke() as its policy, rather than the policy hub's MaxPolicy. ActivePolicyT will be the correct policy for the current ptx version, since calling invoke() from dispatch() causes us to move down the chained policies until we get to the correct one.
… needed because this allows us to define it such that we can use the ActivePolicyT template parameter of invoke() as its policy, rather than the policy hub's MaxPolicy. ActivePolicyT will be the correct policy for the current ptx version, since calling invoke() from dispatch() causes us to move down the chained policies until we get to the correct one."

This reverts commit 2a17133.
…s it from the c.parallel layer. Also we don't need it to have any run-time state
…und an nvcc 12.0 template instantiation issue with vsmem_helper_fallback_policy_t
…_key and fix issue with extra semicolon in kernel template instantiation
@NaderAlAwar NaderAlAwar requested review from a team as code owners February 26, 2025 21:09
@NaderAlAwar NaderAlAwar marked this pull request as draft February 26, 2025 21:09
Copy link

copy-pr-bot bot commented Feb 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

🟨 CI finished in 1h 04m: Pass: 98%/93 | Total: 15h 13m | Avg: 9m 49s | Max: 30m 21s | Hits: 95%/133939
  • 🟥 python: Pass: 0%/1 | Total: 9m 32s | Avg: 9m 32s | Max: 9m 32s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  9m 32s | Avg:  9m 32s | Max:  9m 32s
    
  • 🟩 cub: Pass: 100%/45 | Total: 8h 18m | Avg: 11m 05s | Max: 30m 07s | Hits: 93%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 08m | Avg: 11m 21s | Max: 30m 07s | Hits:  92%/51055 
      🟩 arm64              Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 41s | Hits:  99%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 48m 00s | Avg:  9m 36s | Max: 26m 05s | Hits:  85%/5908  
      🟩 12.5               Pass: 100%/2   | Total: 19m 33s | Avg:  9m 46s | Max:  9m 47s | Hits:  98%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  7h 11m | Avg: 11m 21s | Max: 30m 07s | Hits:  94%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  4m 57s | Hits: 100%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 48m 00s | Avg:  9m 36s | Max: 26m 05s | Hits:  85%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 33s | Avg:  9m 46s | Max:  9m 47s | Hits:  98%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  7h 01m | Avg: 11m 42s | Max: 30m 07s | Hits:  93%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  4m 57s | Hits: 100%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 09m | Avg: 11m 22s | Max: 30m 07s | Hits:  92%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 36s | Avg:  5m 39s | Max:  6m 01s | Hits:  99%/4868  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 26s | Avg:  6m 13s | Max:  6m 22s | Hits: 100%/2430  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 32s | Hits: 100%/2430  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 29s | Hits: 100%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 14m | Avg: 10m 38s | Max: 25m 04s | Hits: 100%/8175  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  6m 06s | Hits:  99%/2434  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 54s | Avg:  5m 54s | Max:  5m 54s | Hits:  99%/1217  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 43s | Hits:  99%/2434  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 18s | Avg:  6m 09s | Max:  6m 17s | Hits:  99%/2434  
      🟩 GCC11              Pass: 100%/2   | Total: 13m 12s | Avg:  6m 36s | Max:  6m 49s | Hits:  99%/2430  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 50s | Avg:  6m 25s | Max:  6m 27s | Hits:  99%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 42m | Avg: 14m 45s | Max: 25m 30s | Hits:  99%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 20s | Avg: 27m 10s | Max: 28m 15s | Hits:  15%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 07s | Max: 30m 07s | Hits:  15%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 33s | Avg:  9m 46s | Max:  9m 47s | Hits:  98%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 14m | Avg:  7m 53s | Max: 25m 04s | Hits:  99%/20333 
      🟩 GCC                Pass: 100%/22  | Total:  3h 50m | Avg: 10m 28s | Max: 25m 30s | Hits:  99%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 54m | Avg: 28m 38s | Max: 30m 07s | Hits:  15%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 33s | Avg:  9m 46s | Max:  9m 47s | Hits:  98%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 50m 15s | Avg: 16m 45s | Max: 23m 49s | Hits:  99%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 02m | Avg:  8m 53s | Max: 30m 07s | Hits:  91%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 26m | Avg: 18m 17s | Max: 25m 30s | Hits:  99%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 19m | Avg:  8m 38s | Max: 30m 07s | Hits:  91%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 39s | Avg: 23m 39s | Max: 23m 39s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 59s | Avg: 16m 59s | Max: 16m 59s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 47s | Max: 25m 30s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 28s | Max: 22m 16s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 50m 15s | Avg: 16m 45s | Max: 23m 49s | Hits:  99%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 37s | Avg:  6m 37s | Max:  6m 37s | Hits:  99%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 10m | Avg:  9m 32s | Max: 30m 07s | Hits:  88%/23535 
      🟩 20                 Pass: 100%/25  | Total:  5h 08m | Avg: 12m 19s | Max: 30m 07s | Hits:  96%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 27m | Avg: 8m 36s | Max: 30m 21s | Hits: 96%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 42s | Avg:  8m 21s | Max: 11m 07s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 18m | Avg:  8m 47s | Max: 30m 21s | Hits:  96%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  5m 02s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 43m 57s | Avg:  8m 47s | Max: 23m 57s | Hits:  94%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 27m 31s | Avg: 13m 45s | Max: 14m 02s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 16m | Avg:  8m 19s | Max: 30m 21s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 03s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 43m 57s | Avg:  8m 47s | Max: 23m 57s | Hits:  94%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 27m 31s | Avg: 13m 45s | Max: 14m 02s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 06m | Avg:  8m 30s | Max: 30m 21s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 03s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 17m | Avg:  8m 47s | Max: 30m 21s | Hits:  96%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 30s | Avg:  5m 07s | Max:  5m 40s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  5m 22s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  5m 24s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 51s | Avg:  5m 25s | Max:  5m 28s | Hits:  99%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 25s | Avg:  6m 12s | Max: 10m 07s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 31s | Avg:  5m 15s | Max:  5m 23s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 19s | Avg:  5m 09s | Max:  5m 20s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 39s | Avg:  5m 19s | Max:  5m 22s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 06s | Avg:  5m 33s | Max:  5m 35s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 50s | Avg:  5m 55s | Max:  5m 57s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 34s | Max: 11m 40s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 48m 03s | Avg: 24m 01s | Max: 24m 06s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 20m | Avg: 26m 57s | Max: 30m 21s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 27m 31s | Avg: 13m 45s | Max: 14m 02s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 35m | Avg:  5m 38s | Max: 10m 07s | Hits:  99%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 15m | Avg:  6m 26s | Max: 11m 40s | Hits:  99%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 08m | Avg: 25m 46s | Max: 30m 21s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 27m 31s | Avg: 13m 45s | Max: 14m 02s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 10s | Avg:  8m 05s | Max: 11m 32s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 09m | Avg:  7m 34s | Max: 24m 43s | Hits:  97%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 01m | Avg: 12m 10s | Max: 30m 21s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 57m | Avg:  7m 50s | Max: 25m 47s | Hits:  96%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 25s | Avg: 15m 08s | Max: 30m 21s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 26s | Avg: 11m 06s | Max: 11m 40s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 10s | Avg:  8m 05s | Max: 11m 32s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 51m | Avg:  8m 35s | Max: 24m 43s | Hits:  95%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 19m | Avg:  8m 39s | Max: 30m 21s | Hits:  97%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 17m 10s | Avg: 8m 35s | Max: 14m 46s | Hits: 98%/318

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 17m 10s | Avg:  8m 35s | Max: 14m 46s | Hits:  98%/318   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 24s | Avg:  2m 24s | Max:  2m 24s | Hits:  97%/159   
      🟩 Test               Pass: 100%/1   | Total: 14m 46s | Avg: 14m 46s | Max: 14m 46s | Hits:  98%/159   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Add Python wrappers for unique by key
1 participant