[STF] Low level interface for the cuda_kernel(_chain) construct#5319
Merged
caugonnet merged 39 commits intoNVIDIA:mainfrom Aug 4, 2025
Merged
[STF] Low level interface for the cuda_kernel(_chain) construct#5319caugonnet merged 39 commits intoNVIDIA:mainfrom
caugonnet merged 39 commits intoNVIDIA:mainfrom
Conversation
…rs used by kernels
Contributor
Contributor
Author
|
/ok to test 316167f |
Contributor
🟩 CI finished in 32m 04s: Pass: 100%/32 | Total: 7h 03m | Avg: 13m 13s | Max: 29m 27s | Hits: 75%/15978
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 32)
| # | Runner |
|---|---|
| 17 | linux-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtx2080-latest-1 |
| 4 | linux-arm64-cpu16 |
| 4 | windows-amd64-cpu16 |
| 1 | linux-amd64-gpu-h100-latest-1 |
andralex
reviewed
Jul 24, 2025
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
andralex
reviewed
Jul 24, 2025
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
andralex
reviewed
Jul 24, 2025
Comment on lines
124
to
125
| dim3 gridDim; | ||
| dim3 blockDim; |
Contributor
There was a problem hiding this comment.
Do these have a default ctor, and if not, should we add a = {}?
andralex
reviewed
Jul 24, 2025
andralex
reviewed
Jul 24, 2025
andralex
reviewed
Jul 24, 2025
Contributor
|
Made a pass, good stuff @caugonnet. Thanks @bernhardmgruber for taking a look! |
…vebayer for finding this)
Contributor
Author
|
/ok to test c3a4c99 |
Contributor
@caugonnet, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
…pe.cuh Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
Contributor
Author
|
/ok to test 9a9cde3 |
Contributor
@caugonnet, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
Contributor
Author
|
/ok to test dec6b46 |
Contributor
🟩 CI finished in 32m 30s: Pass: 100%/32 | Total: 7h 41m | Avg: 14m 24s | Max: 32m 25s | Hits: 72%/15390
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 32)
| # | Runner |
|---|---|
| 17 | linux-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtx2080-latest-1 |
| 4 | linux-arm64-cpu16 |
| 4 | windows-amd64-cpu16 |
| 1 | linux-amd64-gpu-h100-latest-1 |
bernhardmgruber
approved these changes
Aug 4, 2025
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Sep 23, 2025
…IA#5319) * Allow CUfunction (driver API) in the cuda_kernel(_chain) API * clang-format * We have a std::tuple not a cuda::std::tuple (yet) * If CUDASTF_CUDA_KERNEL_DEBUG is set, we display the number of registers used by kernels * Support CUkernel in addition to CUfunction * Add a test with CUfunction and CUkernel * Check whether CUkernel is supported * use _CCCL_ASSERT instead of assert to avoid an unused variable error * cudaGetKernel was added in CUDA 12.1 * clang-format * Extract the start and end phase of the ->* operator * There is no need to store untyped_t as we now store the task with its type * Implement the low level interface for cuda_kernel(_chain) with a way to avoid using the ->* operator * - Add a test to ensure we can put no arguments in the cuda_kernel_desc constructor - Implement a low level API to describe cuda_kernel_desc with an array of pointers rather than a variadic interface (and use it in a test) * simpler code, and do not check for CUDA_VERSION >= 12 * Simpler code Co-authored-by: Andrei Alexandrescu <andrei@erdani.com> * Update cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh Co-authored-by: Andrei Alexandrescu <andrei@erdani.com> * clang-format * Do not test for CUDA_VERSION >= 12 * Add missing const * add missing template * Use _CCCL_CTK_AT_LEAST * replace std::visit by std::get_if in get_num_registers * use ::std::get_if instead of ::std::visit * Add some tests for get_num_registers which actually fails (thanks @davebayer for finding this) * Fix the method to get the number of registers for CUkernel * Update cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh Co-authored-by: Andrei Alexandrescu <andrei@erdani.com> --------- Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implement a low-level interface for the cuda_kernel(_chain) construct so that it can later be called from C/python
closes
Checklist