Skip to content

Allow External Code to Use Coorperative Group#240

Merged
gmarkall merged 13 commits intoNVIDIA:mainfrom
isVoid:fea-extern-cg
May 21, 2025
Merged

Allow External Code to Use Coorperative Group#240
gmarkall merged 13 commits intoNVIDIA:mainfrom
isVoid:fea-extern-cg

Conversation

@isVoid
Copy link
Contributor

@isVoid isVoid commented May 5, 2025

Currently, Numba-CUDA detects the use of cudaCGIntrinsicHandle in PTX and auto launch the kernel with cuLaunchKernelCooperative. This only works when the Numba kernel itself uses the internal typing extensions which gaurentees the use of this call. For external device functions that invokes CG intrinsics, they don't have a way to pass in info to inform Numba to cooperative launch kernel. This PR adds that support.

  • Add tests.

@gmarkall gmarkall added 3 - Ready for Review Ready for review by team 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team labels May 6, 2025
@isVoid
Copy link
Contributor Author

isVoid commented May 8, 2025

Note to self: see #243 (comment), the CUDACodelibrary.use_coorperative flag should be carried over from one codelib to another.

@isVoid isVoid mentioned this pull request May 14, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 15, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@isVoid
Copy link
Contributor Author

isVoid commented May 15, 2025

/ok to test

@isVoid
Copy link
Contributor Author

isVoid commented May 15, 2025

/ok to test

@isVoid isVoid marked this pull request as ready for review May 15, 2025 23:45
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 15, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall picture looks good - some minor comments on the diff.

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels May 20, 2025
@isVoid
Copy link
Contributor Author

isVoid commented May 21, 2025

/ok to test

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels May 21, 2025
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
@isVoid
Copy link
Contributor Author

isVoid commented May 21, 2025

/ok to test

@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 21, 2025
@gmarkall gmarkall merged commit f1c7453 into NVIDIA:main May 21, 2025
37 checks passed
@isVoid isVoid mentioned this pull request May 21, 2025
isVoid added a commit that referenced this pull request May 21, 2025
- Allow External Code to Use Cooperative Group (#240)
- Improve debug info for kernel arguments (#242)
- Allow Numba NVRTC Binding Search Additional Paths (#254)
- Add Bfloat16 High Level API, Documentation (#245)
- add a test to use bf16 bindings inside device functions (#244)
- Change CI to only be manually triggered to save on CI runs (#252)
- Simplify the CI build and test matrix (#249)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments