Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unique by Key Implementation for c.parallel #3947

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

NaderAlAwar
Copy link
Contributor

Description

Closes #3807

This should only be merged after #3816

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

…because this allows us to define it such that we can use the ActivePolicyT template parameter of invoke() as its policy, rather than the policy hub's MaxPolicy. ActivePolicyT will be the correct policy for the current ptx version, since calling invoke() from dispatch() causes us to move down the chained policies until we get to the correct one.
… needed because this allows us to define it such that we can use the ActivePolicyT template parameter of invoke() as its policy, rather than the policy hub's MaxPolicy. ActivePolicyT will be the correct policy for the current ptx version, since calling invoke() from dispatch() causes us to move down the chained policies until we get to the correct one."

This reverts commit 2a17133.
…s it from the c.parallel layer. Also we don't need it to have any run-time state
…und an nvcc 12.0 template instantiation issue with vsmem_helper_fallback_policy_t
…_key and fix issue with extra semicolon in kernel template instantiation
@NaderAlAwar NaderAlAwar requested review from a team as code owners February 26, 2025 16:12
@NaderAlAwar NaderAlAwar marked this pull request as draft February 26, 2025 16:12
Copy link

copy-pr-bot bot commented Feb 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

🟩 CI finished in 1h 27m: Pass: 100%/93 | Total: 21h 18m | Avg: 13m 44s | Max: 1h 09m | Hits: 93%/133939
  • 🟩 cub: Pass: 100%/45 | Total: 12h 09m | Avg: 16m 12s | Max: 1h 09m | Hits: 92%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 11h 58m | Avg: 16m 42s | Max:  1h 09m | Hits:  91%/51055 
      🟩 arm64              Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 54s | Hits:  99%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 00m | Hits:  85%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 07m | Hits:  71%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  8h 38m | Avg: 13m 38s | Max:  1h 09m | Hits:  94%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 48s | Avg:  4m 54s | Max:  4m 56s | Hits: 100%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 00m | Hits:  85%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 07m | Hits:  71%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  8h 28m | Avg: 14m 08s | Max:  1h 09m | Hits:  93%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 48s | Avg:  4m 54s | Max:  4m 56s | Hits: 100%/2100  
      🟩 nvcc               Pass: 100%/43  | Total: 11h 59m | Avg: 16m 44s | Max:  1h 09m | Hits:  91%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 52s | Avg:  5m 43s | Max:  6m 10s | Hits: 100%/4868  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 46s | Avg:  5m 53s | Max:  5m 53s | Hits: 100%/2430  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 51s | Avg:  5m 55s | Max:  5m 57s | Hits: 100%/2430  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 08s | Hits: 100%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 09m | Avg:  9m 59s | Max: 23m 40s | Hits: 100%/8175  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  6m 07s | Hits:  99%/2434  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 10s | Avg:  6m 10s | Max:  6m 10s | Hits:  99%/1217  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 22s | Avg:  6m 11s | Max:  6m 14s | Hits:  99%/2434  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 07s | Avg:  6m 03s | Max:  6m 09s | Hits:  99%/2434  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 54s | Avg:  6m 27s | Max:  6m 33s | Hits:  99%/2430  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 05s | Avg:  6m 32s | Max:  6m 37s | Hits:  99%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 43m | Avg: 14m 53s | Max: 24m 30s | Hits:  99%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 13s | Max:  1h 00m | Hits:  15%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 09m | Hits:  15%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 07m | Hits:  71%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 08m | Avg:  7m 33s | Max: 23m 40s | Hits: 100%/20333 
      🟩 GCC                Pass: 100%/22  | Total:  3h 52m | Avg: 10m 32s | Max: 24m 30s | Hits:  99%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 09m | Hits:  15%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 07m | Hits:  71%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 49m 17s | Avg: 16m 25s | Max: 23m 03s | Hits:  99%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  8h 56m | Avg: 15m 47s | Max:  1h 09m | Hits:  89%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 23m | Avg: 17m 57s | Max: 24m 30s | Hits:  99%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  9h 15m | Avg: 15m 00s | Max:  1h 09m | Hits:  90%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 43s | Avg: 21m 43s | Max: 21m 43s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 59s | Avg: 16m 59s | Max: 16m 59s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 44s | Max: 24m 30s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 35s | Max: 24m 06s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 49m 17s | Avg: 16m 25s | Max: 23m 03s | Hits:  99%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s | Hits:  99%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 47m | Avg: 17m 22s | Max:  1h 09m | Hits:  87%/23535 
      🟩 20                 Pass: 100%/25  | Total:  6h 22m | Avg: 15m 17s | Max:  1h 01m | Hits:  95%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 8h 06m | Avg: 10m 49s | Max: 40m 31s | Hits: 95%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 41s | Avg:  8m 20s | Max: 10m 47s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  7h 57m | Avg: 11m 05s | Max: 40m 31s | Hits:  95%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  5m 11s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 57m 51s | Avg: 11m 34s | Max: 34m 30s | Hits:  88%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 21s | Max: 39m 25s | Hits:  81%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 50m | Avg:  9m 13s | Max: 40m 31s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 27s | Avg:  5m 13s | Max:  5m 24s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 57m 51s | Avg: 11m 34s | Max: 34m 30s | Hits:  88%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 18m | Avg: 39m 21s | Max: 39m 25s | Hits:  81%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 39m | Avg:  9m 26s | Max: 40m 31s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 27s | Avg:  5m 13s | Max:  5m 24s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  7h 56m | Avg: 11m 04s | Max: 40m 31s | Hits:  95%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 02s | Avg:  5m 00s | Max:  5m 20s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 50s | Avg:  5m 25s | Max:  5m 33s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 17s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 29s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 08s | Avg:  6m 09s | Max:  9m 16s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  5m 17s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 15m 00s | Avg:  7m 30s | Max:  9m 07s | Hits:  85%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 40s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 35s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 39s | Avg:  5m 49s | Max:  5m 55s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 35s | Max: 11m 26s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 40s | Max: 34m 50s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 43m | Avg: 34m 33s | Max: 40m 31s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 18m | Avg: 39m 21s | Max: 39m 25s | Hits:  81%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 35m | Avg:  5m 35s | Max:  9m 16s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 19m | Avg:  6m 39s | Max: 11m 26s | Hits:  98%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 52m | Avg: 34m 35s | Max: 40m 31s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 21s | Max: 39m 25s | Hits:  81%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 03s | Avg:  8m 01s | Max: 11m 23s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  5h 35m | Avg: 10m 10s | Max: 39m 25s | Hits:  95%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 14m | Avg: 13m 29s | Max: 40m 31s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  6h 38m | Avg: 10m 29s | Max: 40m 31s | Hits:  95%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 18s | Avg: 15m 06s | Max: 29m 19s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 42m 52s | Avg: 10m 43s | Max: 11m 26s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 03s | Avg:  8m 01s | Max: 11m 23s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 10s | Avg:  6m 10s | Max:  6m 10s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 53m | Avg: 11m 39s | Max: 39m 18s | Hits:  93%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 57m | Avg: 10m 18s | Max: 40m 31s | Hits:  96%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 17m 20s | Avg: 8m 40s | Max: 14m 40s | Hits: 95%/318

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 17m 20s | Avg:  8m 40s | Max: 14m 40s | Hits:  95%/318   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 40s | Avg:  2m 40s | Max:  2m 40s | Hits:  91%/159   
      🟩 Test               Pass: 100%/1   | Total: 14m 40s | Avg: 14m 40s | Max: 14m 40s | Hits:  98%/159   
    
  • 🟩 python: Pass: 100%/1 | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 44m 50s | Avg: 44m 50s | Max: 44m 50s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[FEA]: Implement cccl.c.parallel version of unique by key
1 participant