-
Notifications
You must be signed in to change notification settings - Fork 55
Move CallConv from CUDAContext to FunctionDescriptor
#717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Greptile OverviewGreptile SummaryThis PR refactors calling convention management by moving it from the global Key Changes:
Impact:
Confidence Score: 5/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
|
|
||
| @skip_under_cuda_memcheck("Hangs cuda-memcheck") | ||
| @skip_on_cudasim("disabled for cudasim") | ||
| @unittest.skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test skipped without explanation - check if the refactoring broke this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the multithread / multigpu test were suspected to be not tested properly in the current CI system as they are all single-GPU runners. They show up as failures in my local environment but was not related this PR.
| ) | ||
| resty = cres.signature.return_type | ||
|
|
||
| if resty and not device and resty != types.void: | ||
| raise TypeError("CUDA kernel must have void return type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed cabi_wrap_function call for device functions with C ABI. Verify this was intentional and that device functions with abi="c" still work correctly without the wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed intentional, because function / kernels can bare its own calling convention, rather than using a target-wide calling convention. cabi_wrap_function was meant to patch a function with Numba ABI to a caller that expects a CABI function. The fix in this PR is imply switching the kernel to bare a CUDACABICallConv and should work as expected.
Additional Comments (1)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
|
CallConv from CUDAContext to FunctionDescriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
|
|
/ok to test |
|
/ok to test 855b7de |
numba_cuda/numba/cuda/target.py
Outdated
| @property | ||
| def call_conv(self): | ||
| return CUDACallConv(self) | ||
| return self.fndesc.call_conv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can delete this property now, and all uses of call_conv should come from fndesc directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is moved in 684ad4f. Though in many cases it's still a pass through via context.fndesc.call_conv.
|
@gmarkall I believe I addressed most of the review comments above. A few things that still stands out to me:
|
|
/ok to test 864a40c |
I think for Numba-CUDA I'd just get rid of the error model stuff entirely, but I think it's a bit too much upheaval right now (even for a separate PR).
CCCL is using I think it'd be good to eventually move them, but I think I might leave it as a TODO item here. |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good to go - the only issue is it turns out that external code uses the context.call_conv (Awkward Array) so we might need to put that back for backward compatibility. Since the tests pass for all our tests, we know we have no internal uses of this left anymore.
Perhaps when reinstating it, it could raise a DeprecationWarning if used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
|
/ok to test 15dcbe5 |
|
Note to self: I need to run this locally with the changes from #770 merged to see how they interact. |
|
Automatic reviews are disabled for this repository. |
|
/ok to test b46435a |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Also, this works correctly in conjunction with #770.
- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests (NVIDIA#686) - fix: fix boolean return type mismatch in C ABI wrapper (NVIDIA#770) - Remove unused `rtapi.py` (NVIDIA#773) - feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode (NVIDIA#665) - Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (NVIDIA#717) - Generate line info for PHI exporters in terminator block (NVIDIA#756) - Add `cuda-core` to `oldest` tests (NVIDIA#769) - build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory (NVIDIA#768) - Enable apt proxy caching; skip hosted Windows builds (NVIDIA#766) - Disable automatic review trigger for Greptile (NVIDIA#743) - test(refactor): clean up `run_in_subprocess` (NVIDIA#762) - remove super args (NVIDIA#763)
- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests (#686) - fix: fix boolean return type mismatch in C ABI wrapper (#770) - Remove unused `rtapi.py` (#773) - feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode (#665) - Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (#717) - Generate line info for PHI exporters in terminator block (#756) - Add `cuda-core` to `oldest` tests (#769) - build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory (#768) - Enable apt proxy caching; skip hosted Windows builds (#766) - Disable automatic review trigger for Greptile (#743) - test(refactor): clean up `run_in_subprocess` (#762) - remove super args (#763)
Today, calling conventions are defined globally per compilation context. This makes it hard to switch flexibly between the Numba ABI and the C ABI when declaring external functions. It also explains the need for the kernel “fixup” logic: CUDA kernels are fundamentally C-ABI, but have historically been forced through the Numba ABI path.
This PR moves calling-convention selection to a more granular level, removing these limitations and eliminating the kernel fixup workaround. It also lays the groundwork for users to plug in additional calling-convention implementations in the future.