Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow NVRTC to compile more of CUB #3951

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Feb 26, 2025

I mainly used this to find more places where we include std headers instead of cuda/std headers.

Copy link
Contributor

🟩 CI finished in 1h 33m: Pass: 100%/93 | Total: 2d 15h | Avg: 41m 03s | Max: 1h 16m | Hits: 72%/133929
  • 🟩 cub: Pass: 100%/45 | Total: 1d 16h | Avg: 54m 10s | Max: 1h 16m | Hits: 64%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 14h | Avg: 53m 49s | Max:  1h 16m | Hits:  64%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 05m | Hits:  61%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 54m | Avg: 58m 55s | Max:  1h 04m | Hits:  53%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 14m | Hits:  48%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 09h | Avg: 52m 33s | Max:  1h 16m | Hits:  66%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 54m | Avg: 58m 55s | Max:  1h 04m | Hits:  53%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 14m | Hits:  48%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 07h | Avg: 51m 55s | Max:  1h 16m | Hits:  66%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  67%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 14h | Avg: 53m 43s | Max:  1h 16m | Hits:  64%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 52m | Avg: 58m 09s | Max:  1h 00m | Hits:  62%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 50m | Avg: 55m 18s | Max: 55m 26s | Hits:  62%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 46s | Max:  1h 00m | Hits:  62%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 19s | Max:  1h 02m | Hits:  62%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 41m | Avg: 48m 50s | Max:  1h 04m | Hits:  74%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 31s | Max:  1h 01m | Hits:  61%/2434  
      🟩 GCC8               Pass: 100%/1   | Total: 57m 23s | Avg: 57m 23s | Max: 57m 23s | Hits:  61%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 07s | Max: 57m 23s | Hits:  61%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m | Hits:  61%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 57m | Avg: 58m 33s | Max:  1h 00m | Hits:  61%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m | Hits:  61%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 07m | Avg: 38m 53s | Max:  1h 16m | Hits:  82%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 13m | Hits:  13%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 14m | Hits:  13%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 14m | Hits:  48%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 23m | Avg: 54m 18s | Max:  1h 04m | Hits:  67%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 18h 02m | Avg: 49m 13s | Max:  1h 16m | Hits:  72%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 45m | Avg:  1h 11m | Max:  1h 14m | Hits:  13%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 14m | Hits:  48%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 45s | Max: 27m 01s | Hits:  87%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 11h | Avg:  1h 02m | Max:  1h 16m | Hits:  56%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 07m | Avg: 30m 54s | Max:  1h 00m | Hits:  90%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 13h | Avg:  1h 01m | Max:  1h 16m | Hits:  56%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 21s | Avg: 21m 21s | Max: 21m 21s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 50s | Avg: 16m 50s | Max: 16m 50s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 17s | Max: 25m 40s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 09s | Max: 25m 14s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 45s | Max: 27m 01s | Hits:  87%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 16m | Avg:  1h 16m | Max:  1h 16m | Hits:  61%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 40m | Avg:  1h 02m | Max:  1h 14m | Hits:  55%/23535 
      🟩 20                 Pass: 100%/25  | Total: 19h 57m | Avg: 47m 54s | Max:  1h 16m | Hits:  72%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 54m | Avg: 29m 12s | Max: 56m 00s | Hits: 76%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 33m 50s | Avg: 16m 55s | Max: 23m 25s | Hits:  88%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 21h 02m | Avg: 29m 22s | Max: 56m 00s | Hits:  76%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 51m 37s | Avg: 25m 48s | Max: 27m 40s | Hits:  76%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 47m | Avg: 33m 24s | Max: 51m 47s | Hits:  72%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  1h 47m | Avg: 53m 41s | Max: 56m 00s | Hits:  62%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 17h 20m | Avg: 27m 22s | Max: 55m 16s | Hits:  78%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 49m 37s | Avg: 24m 48s | Max: 26m 35s | Hits:  77%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 47m | Avg: 33m 24s | Max: 51m 47s | Hits:  72%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 47m | Avg: 53m 41s | Max: 56m 00s | Hits:  62%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 30m | Avg: 27m 30s | Max: 55m 16s | Hits:  78%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 37s | Avg: 24m 48s | Max: 26m 35s | Hits:  77%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 21h 04m | Avg: 29m 25s | Max: 56m 00s | Hits:  76%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 53m | Avg: 28m 16s | Max: 29m 52s | Hits:  76%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 55m 41s | Avg: 27m 50s | Max: 28m 10s | Hits:  76%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 59m 10s | Avg: 29m 35s | Max: 30m 18s | Hits:  76%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 55m 06s | Avg: 27m 33s | Max: 28m 19s | Hits:  76%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 25m | Avg: 20m 50s | Max: 27m 32s | Hits:  83%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 57m 19s | Avg: 28m 39s | Max: 28m 49s | Hits:  76%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 30m 03s | Avg: 30m 03s | Max: 30m 03s | Hits:  76%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 02s | Max: 31m 40s | Hits:  76%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 59m 12s | Avg: 29m 36s | Max: 31m 28s | Hits:  76%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 57m 37s | Avg: 28m 48s | Max: 29m 01s | Hits:  76%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 52s | Max: 30m 58s | Hits:  76%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 27m | Avg: 20m 44s | Max: 31m 40s | Hits:  84%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 29s | Max: 51m 47s | Hits:  54%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 21m | Avg: 47m 16s | Max: 55m 16s | Hits:  59%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 47m | Avg: 53m 41s | Max: 56m 00s | Hits:  62%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 08m | Avg: 25m 13s | Max: 30m 18s | Hits:  79%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  8h 55m | Avg: 25m 29s | Max: 31m 40s | Hits:  80%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 02m | Avg: 48m 33s | Max: 55m 16s | Hits:  57%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 47m | Avg: 53m 41s | Max: 56m 00s | Hits:  62%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 18s | Avg: 15m 09s | Max: 18m 58s | Hits:  88%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 17h 44m | Avg: 32m 15s | Max: 56m 00s | Hits:  73%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 39m | Avg: 21m 58s | Max: 55m 13s | Hits:  84%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 20h 21m | Avg: 32m 08s | Max: 56m 00s | Hits:  73%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 39s | Avg: 15m 33s | Max: 31m 19s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 46m 39s | Avg: 11m 39s | Max: 14m 43s | Hits:  96%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 18s | Avg: 15m 09s | Max: 18m 58s | Hits:  88%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 31m 24s | Avg: 31m 24s | Max: 31m 24s | Hits:  76%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 17m | Avg: 33m 53s | Max: 56m 00s | Hits:  72%/35611 
      🟩 20                 Pass: 100%/23  | Total: 10h 03m | Avg: 26m 13s | Max: 55m 13s | Hits:  79%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 05s | Avg: 7m 32s | Max: 12m 31s | Hits: 97%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 31s | Hits:  97%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 34s | Avg:  2m 34s | Max:  2m 34s | Hits:  96%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 51m 05s | Avg: 51m 05s | Max: 51m 05s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant