Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a constant for the amount of static SMEM #2374

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

bernhardmgruber
Copy link
Contributor

This PR replaces all uses of 48 * 1024 by cub::detail::max_smem_per_block in CUB.

@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Sep 5, 2024
@bernhardmgruber bernhardmgruber marked this pull request as ready for review September 5, 2024 08:57
@bernhardmgruber bernhardmgruber requested review from a team as code owners September 5, 2024 08:57
@bernhardmgruber bernhardmgruber enabled auto-merge (squash) September 5, 2024 09:46
Copy link
Contributor

github-actions bot commented Sep 5, 2024

🟨 CI finished in 2h 54m: Pass: 96%/251 | Total: 6d 05h | Avg: 35m 41s | Max: 1h 12m | Hits: 58%/24375
  • 🟨 cub: Pass: 93%/132 | Total: 3d 20h | Avg: 42m 11s | Max: 1h 12m | Hits: 3%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/124 | Total:  3d 13h | Avg: 41m 27s | Max:  1h 12m | Hits:   3%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 36s | Max: 57m 28s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 29m | Avg: 45m 57s | Max: 52m 11s | Hits:   3%/716   
      🟩 11.8               Pass: 100%/3   | Total:  3h 27m | Avg:  1h 09m | Max:  1h 12m
      🔍 12.5               Pass:  92%/114 | Total:  3d 05h | Avg: 40m 59s | Max:  1h 05m | Hits:   3%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 37s | Avg: 25m 48s | Max: 27m 25s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 29m | Avg: 45m 57s | Max: 52m 11s | Hits:   3%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 27m | Avg:  1h 09m | Max:  1h 12m
      🔍 nvcc12.5           Pass:  91%/112 | Total:  3d 05h | Avg: 41m 15s | Max:  1h 05m | Hits:   3%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 37s | Avg: 25m 48s | Max: 27m 25s
      🔍 nvcc               Pass:  93%/130 | Total:  3d 19h | Avg: 42m 26s | Max:  1h 12m | Hits:   3%/4296  
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 45m | Avg: 47m 30s | Max: 51m 25s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 31m | Avg: 50m 36s | Max: 53m 27s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 31s | Max: 56m 27s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 23s | Max: 55m 05s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 19m | Avg: 49m 51s | Max: 52m 58s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 23s | Max: 54m 26s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 23m | Avg: 50m 46s | Max: 55m 21s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 23s | Max: 52m 15s
      🟨 Clang17            Pass:  84%/26  | Total: 12h 19m | Avg: 28m 27s | Max: 55m 35s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 27m | Avg: 43m 53s | Max: 45m 44s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 46m | Avg: 47m 42s | Max: 48m 54s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 50m | Avg: 48m 21s | Max: 53m 26s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 58m | Avg: 49m 41s | Max: 55m 08s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 25m | Avg: 51m 17s | Max: 53m 34s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 54m | Avg: 59m 09s | Max:  1h 12m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 31m | Avg: 52m 50s | Max: 56m 00s
      🟨 GCC13              Pass:  82%/29  | Total: 13h 45m | Avg: 28m 27s | Max: 57m 28s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 56m | Avg: 58m 51s | Max:  1h 00m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 52m 11s | Avg: 52m 11s | Max: 52m 11s | Hits:   3%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:   3%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 12m | Avg:  1h 04m | Max:  1h 05m | Hits:   3%/2148  
    🟨 cxx_family
      🟨 Clang              Pass:  93%/59  | Total:  1d 16h | Avg: 40m 42s | Max: 56m 27s
      🟨 GCC                Pass:  92%/64  | Total:  1d 19h | Avg: 40m 54s | Max:  1h 12m
      🟩 Intel              Pass: 100%/3   | Total:  2h 56m | Avg: 58m 51s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/6   | Total:  6h 12m | Avg:  1h 02m | Max:  1h 05m | Hits:   3%/4296  
    🟨 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 10h | Avg: 50m 13s | Max:  1h 12m | Hits:   3%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 16m | Avg: 17m 05s | Max: 20m 25s
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 59m | Avg: 14m 56s | Max: 17m 06s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 18m | Avg: 17m 18s | Max: 19m 32s
      🟥 SmallGMem          Pass:   0%/1   | Total: 34m 40s | Avg: 34m 40s | Max: 34m 40s
      🟥 TestGPU            Pass:   0%/8   | Total:  2h 47m | Avg: 20m 55s | Max: 27m 20s
    🟨 gpu
      🟨 v100               Pass:  93%/132 | Total:  3d 20h | Avg: 42m 11s | Max:  1h 12m | Hits:   3%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 27m | Avg:  1h 09m | Max:  1h 12m
      🟩 90a                Pass: 100%/4   | Total:  1h 29m | Avg: 22m 20s | Max: 24m 17s
    🟨 std
      🟨 11                 Pass:  94%/34  | Total: 23h 19m | Avg: 41m 09s | Max:  1h 03m
      🟨 14                 Pass:  94%/37  | Total:  1d 03h | Avg: 44m 11s | Max:  1h 12m | Hits:   3%/2148  
      🟨 17                 Pass:  91%/37  | Total:  1d 02h | Avg: 42m 45s | Max:  1h 10m | Hits:   3%/1432  
      🟨 20                 Pass:  91%/24  | Total: 15h 52m | Avg: 39m 40s | Max:  1h 05m | Hits:   3%/716   
    
  • 🟥 pycuda: Pass: 0%/1 | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 ctk
      🟥 12.5               Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 cudacxx
      🟥 nvcc12.5           Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 08h | Avg: 28m 38s | Max: 1h 07m | Hits: 70%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 04h | Avg: 28m 37s | Max:  1h 07m | Hits:  70%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 51m | Avg: 28m 58s | Max: 33m 09s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 43m | Avg: 26m 55s | Max: 50m 15s | Hits:  56%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 44m 08s
      🟩 12.5               Pass: 100%/100 | Total:  1d 23h | Avg: 28m 35s | Max:  1h 07m | Hits:  72%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 54m 43s | Avg: 27m 21s | Max: 28m 05s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 43m | Avg: 26m 55s | Max: 50m 15s | Hits:  56%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 44m 08s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 22h | Avg: 28m 37s | Max:  1h 07m | Hits:  72%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 43s | Avg: 27m 21s | Max: 28m 05s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 07h | Avg: 28m 39s | Max:  1h 07m | Hits:  70%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 49m | Avg: 28m 16s | Max: 35m 27s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 39m | Avg: 33m 09s | Max: 36m 51s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 55m | Avg: 28m 46s | Max: 31m 06s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 27s | Max: 30m 52s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 39s | Max: 34m 36s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 58m | Avg: 29m 30s | Max: 32m 07s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 57s | Max: 35m 53s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 25s | Max: 35m 12s
      🟩 Clang17            Pass: 100%/18  | Total:  6h 06m | Avg: 20m 21s | Max: 34m 26s
      🟩 GCC6               Pass: 100%/2   | Total: 47m 00s | Avg: 23m 30s | Max: 25m 57s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 48m | Avg: 28m 06s | Max: 35m 04s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 47m | Avg: 27m 59s | Max: 34m 50s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 50m | Avg: 28m 20s | Max: 35m 24s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 07m | Avg: 31m 53s | Max: 36m 59s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 04m | Avg: 34m 56s | Max: 44m 08s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 09m | Avg: 32m 15s | Max: 35m 44s
      🟩 GCC13              Pass: 100%/20  | Total:  7h 08m | Avg: 21m 25s | Max: 37m 58s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 57m | Avg: 39m 03s | Max: 44m 47s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 50m 15s | Avg: 50m 15s | Max: 50m 15s | Hits:  56%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 01m | Hits:  56%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 10m | Avg: 41m 43s | Max:  1h 07m | Hits:  78%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 22h 38m | Avg: 26m 38s | Max: 36m 51s
      🟩 GCC                Pass: 100%/55  | Total:  1d 00h | Avg: 26m 58s | Max: 44m 08s
      🟩 Intel              Pass: 100%/3   | Total:  1h 57m | Avg: 39m 03s | Max: 44m 47s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 00m | Avg: 46m 44s | Max:  1h 07m | Hits:  70%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 08h | Avg: 28m 38s | Max:  1h 07m | Hits:  70%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 04h | Avg: 31m 40s | Max:  1h 07m | Hits:  56%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 28m | Avg: 13m 31s | Max: 37m 58s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 35m | Avg: 11m 54s | Max: 13m 32s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 56m | Avg: 38m 43s | Max: 44m 08s
      🟩 90a                Pass: 100%/4   | Total:  1h 12m | Avg: 18m 01s | Max: 21m 07s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 11h 22m | Avg: 22m 44s | Max: 32m 20s
      🟩 14                 Pass: 100%/34  | Total: 17h 10m | Avg: 30m 18s | Max:  1h 01m | Hits:  67%/8924  
      🟩 17                 Pass: 100%/33  | Total: 17h 51m | Avg: 32m 28s | Max:  1h 07m | Hits:  70%/6693  
      🟩 20                 Pass: 100%/21  | Total:  9h 55m | Avg: 28m 22s | Max:  1h 01m | Hits:  78%/4462  
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

template <int NOMINAL_4B_BLOCK_THREADS, int NOMINAL_4B_ITEMS_PER_THREAD, typename T>
struct RegBoundScaling
{
enum
{
ITEMS_PER_THREAD = CUB_MAX(1, NOMINAL_4B_ITEMS_PER_THREAD * 4 / CUB_MAX(4, sizeof(T))),
BLOCK_THREADS = CUB_MIN(NOMINAL_4B_BLOCK_THREADS, (((1024 * 48) / (sizeof(T) * ITEMS_PER_THREAD)) + 31) / 32 * 32),
BLOCK_THREADS = CUB_MIN(NOMINAL_4B_BLOCK_THREADS,
((cub::detail::max_smem_per_block / (sizeof(T) * ITEMS_PER_THREAD)) + 31) / 32 * 32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an equation we might want to put into a function another day

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would already help if ceil_div were constexpr in C++11 ;)

@@ -44,6 +44,7 @@
#endif // no system header

#include <cub/detail/uninitialized_copy.cuh>
#include <cub/util_deprecated.cuh>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that additional include needed? Should it be

Suggested change
#include <cub/util_deprecated.cuh>
#include <cub/util_arch.cuh>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a drive-by fix which I discovered by accident. The file uses CUB_DEPRECATED_BECAUSE which is defined in <cub/util_deprecated.cuh>

Copy link
Contributor

github-actions bot commented Sep 6, 2024

🟩 CI finished in 8h 54m: Pass: 100%/251 | Total: 6d 12h | Avg: 37m 17s | Max: 1h 14m | Hits: 58%/24375
  • 🟩 cub: Pass: 100%/132 | Total: 4d 01h | Avg: 44m 18s | Max: 1h 14m | Hits: 3%/4296

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  3d 18h | Avg: 43m 39s | Max:  1h 14m | Hits:   3%/4296  
      🟩 arm64              Pass: 100%/8   | Total:  7h 16m | Avg: 54m 31s | Max:  1h 00m
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 28m | Avg: 45m 55s | Max: 51m 53s | Hits:   3%/716   
      🟩 11.8               Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 14m
      🟩 12.5               Pass: 100%/114 | Total:  3d 10h | Avg: 43m 22s | Max:  1h 13m | Hits:   3%/3580  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 32s | Avg: 25m 46s | Max: 27m 06s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 28m | Avg: 45m 55s | Max: 51m 53s | Hits:   3%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 14m
      🟩 nvcc12.5           Pass: 100%/112 | Total:  3d 09h | Avg: 43m 41s | Max:  1h 13m | Hits:   3%/3580  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 32s | Avg: 25m 46s | Max: 27m 06s
      🟩 nvcc               Pass: 100%/130 | Total:  4d 00h | Avg: 44m 36s | Max:  1h 14m | Hits:   3%/4296  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  5h 02m | Avg: 50m 20s | Max: 55m 58s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 37m | Avg: 52m 24s | Max: 52m 54s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 19s | Max: 52m 01s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 32m | Avg: 53m 11s | Max: 56m 48s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 01s | Max: 54m 49s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 54s | Max: 56m 10s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 33m | Avg: 53m 29s | Max: 56m 07s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 37m | Avg: 54m 15s | Max: 57m 38s
      🟩 Clang17            Pass: 100%/26  | Total: 13h 26m | Avg: 31m 00s | Max: 58m 48s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 19s | Max: 46m 40s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 58m | Avg: 49m 45s | Max: 56m 24s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 53m | Avg: 48m 54s | Max: 54m 12s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 59m | Avg: 49m 52s | Max: 56m 50s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 43m | Avg: 55m 55s | Max: 58m 43s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 10m | Avg:  1h 01m | Max:  1h 14m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 39m | Avg: 54m 55s | Max: 59m 14s
      🟩 GCC13              Pass: 100%/29  | Total: 14h 48m | Avg: 30m 38s | Max:  1h 00m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 56m | Avg: 58m 50s | Max: 59m 31s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 53s | Avg: 51m 53s | Max: 51m 53s | Hits:   3%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 09m | Hits:   3%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 33m | Avg:  1h 11m | Max:  1h 13m | Hits:   3%/2148  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 18h | Avg: 42m 53s | Max: 58m 48s
      🟩 GCC                Pass: 100%/64  | Total:  1d 21h | Avg: 42m 52s | Max:  1h 14m
      🟩 Intel              Pass: 100%/3   | Total:  2h 56m | Avg: 58m 50s | Max: 59m 31s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 38m | Avg:  1h 06m | Max:  1h 13m | Hits:   3%/4296  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  4d 01h | Avg: 44m 18s | Max:  1h 14m | Hits:   3%/4296  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 13h | Avg: 52m 04s | Max:  1h 14m | Hits:   3%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 38m | Avg: 19m 49s | Max: 24m 54s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 08m | Avg: 16m 05s | Max: 22m 27s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 29m | Avg: 18m 39s | Max: 24m 52s
      🟩 SmallGMem          Pass: 100%/1   | Total: 40m 48s | Avg: 40m 48s | Max: 40m 48s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 37m | Avg: 27m 12s | Max: 35m 28s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 36m | Avg:  1h 12m | Max:  1h 14m
      🟩 90a                Pass: 100%/4   | Total:  1h 35m | Avg: 23m 56s | Max: 25m 40s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  1d 00h | Avg: 44m 00s | Max:  1h 11m
      🟩 14                 Pass: 100%/37  | Total:  1d 04h | Avg: 45m 47s | Max:  1h 10m | Hits:   3%/2148  
      🟩 17                 Pass: 100%/37  | Total:  1d 04h | Avg: 45m 32s | Max:  1h 14m | Hits:   3%/1432  
      🟩 20                 Pass: 100%/24  | Total: 16h 13m | Avg: 40m 34s | Max:  1h 13m | Hits:   3%/716   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 10h | Avg: 29m 38s | Max: 1h 10m | Hits: 70%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 06h | Avg: 29m 39s | Max:  1h 10m | Hits:  70%/20079 
      🟩 arm64              Pass: 100%/8   | Total:  3h 55m | Avg: 29m 25s | Max: 33m 15s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 17m | Avg: 29m 09s | Max: 57m 44s | Hits:  56%/2231  
      🟩 11.8               Pass: 100%/3   | Total:  1h 55m | Avg: 38m 25s | Max: 46m 25s
      🟩 12.5               Pass: 100%/100 | Total:  2d 01h | Avg: 29m 26s | Max:  1h 10m | Hits:  72%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 57m 19s | Avg: 28m 39s | Max: 29m 21s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 17m | Avg: 29m 09s | Max: 57m 44s | Hits:  56%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 55m | Avg: 38m 25s | Max: 46m 25s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  2d 00h | Avg: 29m 27s | Max:  1h 10m | Hits:  72%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 19s | Avg: 28m 39s | Max: 29m 21s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 09h | Avg: 29m 39s | Max:  1h 10m | Hits:  70%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 49m | Avg: 28m 19s | Max: 33m 54s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 35m | Avg: 31m 52s | Max: 36m 59s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 08s | Max: 32m 14s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 31s | Max: 33m 58s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 55m | Avg: 28m 57s | Max: 32m 31s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 53s | Max: 35m 10s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 21s | Max: 37m 10s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 58s | Max: 35m 50s
      🟩 Clang17            Pass: 100%/18  | Total:  6h 42m | Avg: 22m 22s | Max: 36m 40s
      🟩 GCC6               Pass: 100%/2   | Total: 53m 53s | Avg: 26m 56s | Max: 29m 41s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 48m | Avg: 28m 05s | Max: 31m 47s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 56m | Avg: 29m 28s | Max: 34m 15s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 00m | Avg: 30m 02s | Max: 37m 41s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 13m | Avg: 33m 24s | Max: 37m 29s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 13m | Avg: 36m 15s | Max: 46m 25s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 16m | Avg: 34m 04s | Max: 38m 02s
      🟩 GCC13              Pass: 100%/20  | Total:  7h 04m | Avg: 21m 14s | Max: 38m 30s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 51m | Avg: 37m 03s | Max: 45m 08s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 57m 44s | Avg: 57m 44s | Max: 57m 44s | Hits:  56%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 04m | Hits:  56%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 33m | Avg: 45m 36s | Max:  1h 10m | Hits:  78%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 23h 23m | Avg: 27m 31s | Max: 37m 10s
      🟩 GCC                Pass: 100%/55  | Total:  1d 01h | Avg: 27m 46s | Max: 46m 25s
      🟩 Intel              Pass: 100%/3   | Total:  1h 51m | Avg: 37m 03s | Max: 45m 08s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 34m | Avg: 50m 28s | Max:  1h 10m | Hits:  70%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 10h | Avg: 29m 38s | Max:  1h 10m | Hits:  70%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 05h | Avg: 32m 31s | Max:  1h 10m | Hits:  56%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 12m | Avg: 12m 00s | Max: 25m 32s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 25m | Avg: 18m 11s | Max: 36m 40s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 55m | Avg: 38m 25s | Max: 46m 25s
      🟩 90a                Pass: 100%/4   | Total:  1h 13m | Avg: 18m 26s | Max: 21m 48s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 11h 56m | Avg: 23m 53s | Max: 36m 40s
      🟩 14                 Pass: 100%/34  | Total: 17h 40m | Avg: 31m 10s | Max:  1h 04m | Hits:  67%/8924  
      🟩 17                 Pass: 100%/33  | Total: 17h 48m | Avg: 32m 23s | Max:  1h 04m | Hits:  70%/6693  
      🟩 20                 Pass: 100%/21  | Total: 10h 51m | Avg: 31m 00s | Max:  1h 10m | Hits:  78%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 14m 01s | Avg: 14m 01s | Max: 14m 01s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit 07fef97 into NVIDIA:main Sep 6, 2024
264 checks passed
@bernhardmgruber bernhardmgruber deleted the smem_constant branch September 8, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants