Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up AliasTemporaries #1815

Merged
merged 1 commit into from
Jun 6, 2024
Merged

Conversation

bernhardmgruber
Copy link
Contributor

This PR proposes a few fixes and improvements to CUB's AliasTemporaries, including:

  • Allow const size arrays to be passed to AliasTemporaries
  • Fix integer types used to represent pointers and masks

* Allow const size arrays in AliasTemporaries
* Fix integer types used
@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Jun 5, 2024
Comment on lines +77 to +85
constexpr size_t ALIGN_BYTES = 256;
constexpr size_t ALIGN_MASK = ~(ALIGN_BYTES - 1);

// Compute exclusive prefix sum over allocation requests
size_t allocation_offsets[ALLOCATIONS];
size_t bytes_needed = 0;
for (int i = 0; i < ALLOCATIONS; ++i)
{
size_t allocation_bytes = (allocation_sizes[i] + ALIGN_BYTES - 1) & ALIGN_MASK;
allocation_offsets[i] = bytes_needed;
const size_t allocation_bytes = (allocation_sizes[i] + ALIGN_BYTES - 1) & ALIGN_MASK;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the types from int to size_t because that's the type the values promote to in the computations below. Also, the bitflip on a signed int scared me.

@bernhardmgruber bernhardmgruber marked this pull request as ready for review June 5, 2024 19:45
@bernhardmgruber bernhardmgruber requested review from a team as code owners June 5, 2024 19:45
Copy link
Contributor

github-actions bot commented Jun 6, 2024

🟩 CI finished in 8h 35m: Pass: 100%/249 | Total: 4d 16h | Avg: 27m 11s | Max: 55m 08s | Hits: 61%/248310
  • 🟩 cub: Pass: 100%/131 | Total: 2d 16h | Avg: 29m 29s | Max: 51m 04s | Hits: 53%/109044

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 11h | Avg: 29m 13s | Max: 51m 04s | Hits:  54%/102236
      🟩 arm64              Pass: 100%/8   | Total:  4h 27m | Avg: 33m 23s | Max: 38m 57s | Hits:  38%/6808  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 54m | Avg: 27m 39s | Max: 51m 04s | Hits:  36%/11554 
      🟩 11.8               Pass: 100%/3   | Total:  2h 12m | Avg: 44m 18s | Max: 47m 19s | Hits:  38%/2553  
      🟩 12.4               Pass: 100%/113 | Total:  2d 07h | Avg: 29m 20s | Max: 49m 25s | Hits:  56%/94937 
    🟩 cudacxx_full
      🟩 clang-cuda17       Pass: 100%/2   | Total: 39m 51s | Avg: 19m 55s | Max: 20m 20s | Hits:  39%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 54m | Avg: 27m 39s | Max: 51m 04s | Hits:  36%/11554 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 12m | Avg: 44m 18s | Max: 47m 19s | Hits:  38%/2553  
      🟩 nvcc12.4           Pass: 100%/111 | Total:  2d 06h | Avg: 29m 30s | Max: 49m 25s | Hits:  56%/93529 
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total: 39m 51s | Avg: 19m 55s | Max: 20m 20s | Hits:  39%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 15h | Avg: 29m 38s | Max: 51m 04s | Hits:  53%/107636
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total:  2h 55m | Avg: 29m 12s | Max: 34m 52s | Hits:  37%/4884  
      🟩 clang10            Pass: 100%/3   | Total:  1h 41m | Avg: 33m 50s | Max: 36m 38s | Hits:  38%/2559  
      🟩 clang11            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 24s | Max: 34m 23s | Hits:  38%/3412  
      🟩 clang12            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 12s | Max: 37m 47s | Hits:  38%/3412  
      🟩 clang13            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 44s | Max: 34m 28s | Hits:  38%/3412  
      🟩 clang14            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 32s | Max: 33m 02s | Hits:  38%/3412  
      🟩 clang15            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 05s | Max: 35m 51s | Hits:  38%/3404  
      🟩 clang16            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 01s | Max: 38m 36s | Hits:  38%/3404  
      🟩 clang17            Pass: 100%/26  | Total:  9h 59m | Avg: 23m 02s | Max: 37m 29s | Hits:  76%/21832 
      🟩 gcc6               Pass: 100%/2   | Total: 54m 57s | Avg: 27m 28s | Max: 29m 28s | Hits:  36%/1550  
      🟩 gcc7               Pass: 100%/6   | Total:  2h 54m | Avg: 29m 05s | Max: 33m 32s | Hits:  37%/4887  
      🟩 gcc8               Pass: 100%/6   | Total:  3h 00m | Avg: 30m 08s | Max: 36m 41s | Hits:  37%/4887  
      🟩 gcc9               Pass: 100%/6   | Total:  2h 59m | Avg: 29m 58s | Max: 36m 11s | Hits:  37%/4887  
      🟩 gcc10              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 24s | Max: 37m 05s | Hits:  38%/3412  
      🟩 gcc11              Pass: 100%/7   | Total:  4h 33m | Avg: 39m 05s | Max: 47m 19s | Hits:  38%/5957  
      🟩 gcc12              Pass: 100%/4   | Total:  2h 19m | Avg: 34m 55s | Max: 35m 43s | Hits:  38%/3404  
      🟩 gcc13              Pass: 100%/28  | Total: 10h 48m | Avg: 23m 10s | Max: 38m 57s | Hits:  73%/23828 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 54m | Avg: 38m 11s | Max: 42m 24s | Hits:  36%/2331  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 04s | Avg: 51m 04s | Max: 51m 04s | Hits:  37%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 31m | Avg: 45m 49s | Max: 48m 17s | Hits:  37%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 19m | Avg: 46m 32s | Max: 49m 25s | Hits:  37%/2085  
    🟩 cxx_name
      🟩 clang              Pass: 100%/59  | Total:  1d 03h | Avg: 28m 20s | Max: 38m 36s | Hits:  55%/49731 
      🟩 gcc                Pass: 100%/63  | Total:  1d 05h | Avg: 28m 28s | Max: 47m 19s | Hits:  53%/52812 
      🟩 Intel              Pass: 100%/3   | Total:  1h 54m | Avg: 38m 11s | Max: 42m 24s | Hits:  36%/2331  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 42m | Avg: 47m 03s | Max: 51m 04s | Hits:  37%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 16h | Avg: 29m 29s | Max: 51m 04s | Hits:  53%/109044
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 06h | Avg: 33m 00s | Max: 51m 04s | Hits:  38%/81812 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 15m | Avg: 16m 54s | Max: 20m 01s | Hits:  99%/6808  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 03m | Avg: 15m 28s | Max: 18m 48s | Hits:  99%/6808  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 14m | Avg: 16m 45s | Max: 19m 37s | Hits:  99%/6808  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 21m | Avg: 25m 12s | Max: 27m 53s | Hits:  99%/6808  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total:  6h 03m | Avg: 25m 59s | Max: 29m 28s | Hits:  36%/10859 
      🟩 ubuntu20.04        Pass: 100%/35  | Total: 19h 32m | Avg: 33m 29s | Max: 37m 47s | Hits:  38%/29855 
      🟩 ubuntu22.04        Pass: 100%/76  | Total:  1d 10h | Avg: 26m 53s | Max: 47m 19s | Hits:  64%/64160 
      🟩 windows2022        Pass: 100%/6   | Total:  4h 42m | Avg: 47m 03s | Max: 51m 04s | Hits:  37%/4170  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 12m | Avg: 44m 18s | Max: 47m 19s | Hits:  38%/2553  
      🟩 90a                Pass: 100%/4   | Total:  1h 14m | Avg: 18m 32s | Max: 19m 29s | Hits:  38%/3404  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 16h 01m | Avg: 28m 17s | Max: 40m 48s | Hits:  52%/28503 
      🟩 14                 Pass: 100%/37  | Total: 18h 54m | Avg: 30m 39s | Max: 51m 04s | Hits:  51%/30588 
      🟩 17                 Pass: 100%/36  | Total: 18h 06m | Avg: 30m 11s | Max: 47m 19s | Hits:  52%/29822 
      🟩 20                 Pass: 100%/24  | Total: 11h 19m | Avg: 28m 19s | Max: 45m 14s | Hits:  59%/20131 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 00h | Avg: 24m 38s | Max: 55m 08s | Hits: 68%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 21h | Avg: 24m 33s | Max: 55m 08s | Hits:  68%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 26m | Avg: 25m 48s | Max: 29m 37s | Hits:  63%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 10m | Avg: 24m 41s | Max: 48m 29s | Hits:  63%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 46m | Avg: 35m 27s | Max: 39m 04s | Hits:  63%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  1d 16h | Avg: 24m 18s | Max: 55m 08s | Hits:  69%/118018
    🟩 cudacxx_full
      🟩 clang-cuda17       Pass: 100%/2   | Total: 47m 08s | Avg: 23m 34s | Max: 24m 20s | Hits:  62%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 10m | Avg: 24m 41s | Max: 48m 29s | Hits:  63%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 46m | Avg: 35m 27s | Max: 39m 04s | Hits:  63%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  1d 15h | Avg: 24m 19s | Max: 55m 08s | Hits:  69%/115658
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total: 47m 08s | Avg: 23m 34s | Max: 24m 20s | Hits:  62%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 23h | Avg: 24m 39s | Max: 55m 08s | Hits:  68%/136906
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total:  2h 23m | Avg: 23m 59s | Max: 26m 57s | Hits:  63%/7080  
      🟩 clang10            Pass: 100%/3   | Total:  1h 18m | Avg: 26m 08s | Max: 27m 24s | Hits:  63%/3540  
      🟩 clang11            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 58s | Max: 26m 33s | Hits:  63%/4720  
      🟩 clang12            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 12s | Max: 31m 05s | Hits:  63%/4720  
      🟩 clang13            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 46s | Max: 29m 36s | Hits:  63%/4720  
      🟩 clang14            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 59s | Max: 31m 26s | Hits:  63%/4720  
      🟩 clang15            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 51s | Max: 26m 45s | Hits:  63%/4720  
      🟩 clang16            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 04s | Max: 27m 19s | Hits:  63%/4720  
      🟩 clang17            Pass: 100%/18  | Total:  5h 10m | Avg: 17m 15s | Max: 27m 05s | Hits:  79%/21240 
      🟩 gcc6               Pass: 100%/2   | Total: 43m 22s | Avg: 21m 41s | Max: 23m 37s | Hits:  63%/2360  
      🟩 gcc7               Pass: 100%/6   | Total:  2h 23m | Avg: 23m 59s | Max: 27m 13s | Hits:  63%/7086  
      🟩 gcc8               Pass: 100%/6   | Total:  2h 29m | Avg: 24m 54s | Max: 28m 49s | Hits:  63%/7086  
      🟩 gcc9               Pass: 100%/6   | Total:  2h 35m | Avg: 25m 56s | Max: 29m 36s | Hits:  63%/7086  
      🟩 gcc10              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 52s | Max: 28m 28s | Hits:  63%/4724  
      🟩 gcc11              Pass: 100%/7   | Total:  3h 33m | Avg: 30m 32s | Max: 39m 04s | Hits:  63%/8267  
      🟩 gcc12              Pass: 100%/4   | Total:  1h 48m | Avg: 27m 01s | Max: 29m 23s | Hits:  63%/4724  
      🟩 gcc13              Pass: 100%/20  | Total:  6h 17m | Avg: 18m 51s | Max: 29m 56s | Hits:  74%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 36m | Avg: 32m 19s | Max: 34m 44s | Hits:  63%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 29s | Avg: 48m 29s | Max: 48m 29s | Hits:  61%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 52s | Max: 48m 15s | Hits:  61%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 30m | Avg: 35m 06s | Max: 55m 08s | Hits:  80%/7056  
    🟩 cxx_name
      🟩 clang              Pass: 100%/51  | Total: 19h 16m | Avg: 22m 40s | Max: 31m 26s | Hits:  69%/60180 
      🟩 gcc                Pass: 100%/55  | Total: 21h 38m | Avg: 23m 37s | Max: 39m 04s | Hits:  67%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 36m | Avg: 32m 19s | Max: 34m 44s | Hits:  63%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 54m | Avg: 39m 26s | Max: 55m 08s | Hits:  73%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 00h | Avg: 24m 38s | Max: 55m 08s | Hits:  68%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 27m 11s | Max: 55m 08s | Hits:  62%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 04m | Avg: 11m 19s | Max: 29m 13s | Hits:  96%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 30m | Avg: 11m 17s | Max: 12m 07s | Hits:  99%/9444  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total:  5h 21m | Avg: 22m 59s | Max: 27m 14s | Hits:  63%/16529 
      🟩 ubuntu20.04        Pass: 100%/35  | Total: 15h 24m | Avg: 26m 24s | Max: 31m 26s | Hits:  63%/41313 
      🟩 ubuntu22.04        Pass: 100%/60  | Total: 21h 46m | Avg: 21m 46s | Max: 39m 04s | Hits:  71%/70840 
      🟩 windows2022        Pass: 100%/9   | Total:  5h 54m | Avg: 39m 26s | Max: 55m 08s | Hits:  73%/10584 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 46m | Avg: 35m 27s | Max: 39m 04s | Hits:  63%/3543  
      🟩 90a                Pass: 100%/4   | Total: 58m 57s | Avg: 14m 44s | Max: 16m 35s | Hits:  63%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 48m | Avg: 21m 36s | Max: 29m 56s | Hits:  67%/35418 
      🟩 14                 Pass: 100%/34  | Total: 14h 41m | Avg: 25m 55s | Max: 51m 20s | Hits:  67%/40122 
      🟩 17                 Pass: 100%/33  | Total: 14h 23m | Avg: 26m 10s | Max: 49m 48s | Hits:  68%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 33m | Avg: 24m 27s | Max: 55m 08s | Hits:  71%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) June 6, 2024 09:17
@bernhardmgruber bernhardmgruber merged commit d022a20 into NVIDIA:main Jun 6, 2024
554 checks passed
@bernhardmgruber bernhardmgruber deleted the aliases branch June 6, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants