-
Notifications
You must be signed in to change notification settings - Fork 124
[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796
Conversation
keyradical
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM.
|
@pbalcer Yeah aware, thanks! The root group barrier is currently not supported correctly for cooperative-group kernels in the CUDA backend, so the intel/llvm corresponding PR will be It previously passed because the query was returning a single group and it was calling a work-group level barrier rather than device-wide (cross-work-group). |
66532d8 to
c612317
Compare
c612317 to
2359df1
Compare
ac747c3 to
9dcdc62
Compare
…ter backend from the sycl runtime This change is required in order to implement per-device semantics for the urKernelSuggestMaxCooperativeGroupCountExp query.
9dcdc62 to
45a781f
Compare
|
After last rebase, there's a: SYCL :: Regression/device_num.cpp e2e failure that seems unrelated. |
This commit implements the experimental
urKernelSuggestMaxCooperativeGroupCountExp, for the Cuda adapter, to retrieve the maximum number of cooperative groups that can be launched on the device.Additionally, the changes also cache the result of the
CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNTCuda driver query which is used to calculate the device wide maximum cooperative groups, because the Cuda occupancy query used has per SM (Multiprocessor) semantics.Testing and related changes enabling querying this from SYCL: intel/llvm#14333