Allow External Code to Use Coorperative Group#240
Merged
gmarkall merged 13 commits intoNVIDIA:mainfrom May 21, 2025
Merged
Conversation
Contributor
Author
|
Note to self: see #243 (comment), the |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test |
Contributor
Author
|
/ok to test |
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
gmarkall
requested changes
May 20, 2025
Contributor
gmarkall
left a comment
There was a problem hiding this comment.
The overall picture looks good - some minor comments on the diff.
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
Contributor
Author
|
/ok to test |
gmarkall
reviewed
May 21, 2025
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
Contributor
Author
|
/ok to test |
gmarkall
approved these changes
May 21, 2025
Merged
isVoid
added a commit
that referenced
this pull request
May 21, 2025
- Allow External Code to Use Cooperative Group (#240) - Improve debug info for kernel arguments (#242) - Allow Numba NVRTC Binding Search Additional Paths (#254) - Add Bfloat16 High Level API, Documentation (#245) - add a test to use bf16 bindings inside device functions (#244) - Change CI to only be manually triggered to save on CI runs (#252) - Simplify the CI build and test matrix (#249)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, Numba-CUDA detects the use of cudaCGIntrinsicHandle in PTX and auto launch the kernel with
cuLaunchKernelCooperative. This only works when the Numba kernel itself uses the internal typing extensions which gaurentees the use of this call. For external device functions that invokes CG intrinsics, they don't have a way to pass in info to inform Numba to cooperative launch kernel. This PR adds that support.