Skip to content

Conversation

@isVoid
Copy link
Contributor

@isVoid isVoid commented Jan 13, 2026

Today, calling conventions are defined globally per compilation context. This makes it hard to switch flexibly between the Numba ABI and the C ABI when declaring external functions. It also explains the need for the kernel “fixup” logic: CUDA kernels are fundamentally C-ABI, but have historically been forced through the Numba ABI path.

This PR moves calling-convention selection to a more granular level, removing these limitations and eliminating the kernel fixup workaround. It also lays the groundwork for users to plug in additional calling-convention implementations in the future.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 13, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Greptile Overview

Greptile Summary

This PR refactors calling convention management by moving it from the global CUDAContext to individual FunctionDescriptor instances. This enables per-function ABI selection and eliminates the need for kernel fixup workarounds.

Key Changes:

  • Moved CUDACallConv and CUDACABICallConv classes from target.py to callconv.py
  • Added call_conv and abi_info fields to FunctionDescriptor and its subclasses
  • Moved declare_function method from BaseContext to FunctionDescriptor
  • Removed cabi_wrap_function - functions now directly use their specified calling convention
  • Updated all call sites from context.call_conv to fndesc.call_conv (or context.fndesc.call_conv)
  • Added abi parameter to declare_device function enabling C ABI external functions
  • Added comprehensive test coverage for C ABI device functions
  • Deprecated context.call_conv property with warning

Impact:

  • Functions can now independently specify Numba ABI or C ABI
  • External device functions with C ABI are properly supported without wrapper generation
  • Fixed test_device_function_with_debug which previously required expectedFailure decorator
  • 31 files modified with clean migration pattern across the codebase

Confidence Score: 5/5

  • This PR is safe to merge - well-structured refactoring with comprehensive test coverage and proper deprecation handling
  • The refactoring follows a clear, consistent pattern across all 31 files. The architecture change is well-motivated and properly tested with new C ABI device function tests. The removal of cabi_wrap_function is intentional and correct. All call sites have been systematically updated from context.call_conv to fndesc.call_conv. The deprecated context.call_conv property provides a clear migration path.
  • No files require special attention

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/core/funcdesc.py Added call_conv and abi_info fields to FunctionDescriptor, moved declare_function method from context to descriptor
numba_cuda/numba/cuda/core/callconv.py Moved CUDACallConv and CUDACABICallConv from target.py, added mangler method to BaseCallConv
numba_cuda/numba/cuda/target.py Removed calling convention classes (moved to callconv.py), deprecated context.call_conv property
numba_cuda/numba/cuda/compiler.py Removed cabi_wrap_function, updated compile_cuda and declare_device_function to handle ABI at function level
numba_cuda/numba/cuda/flags.py Added call_conv and abi_info options to CUDAFlags
numba_cuda/numba/cuda/core/typed_passes.py Creates CUDACallConv from flags or defaults, passes to function descriptor during lowering
numba_cuda/numba/cuda/core/base.py Removed declare_function method, updated to use fndesc.call_conv and fndesc.declare_function
numba_cuda/numba/cuda/lowering.py Changed call_conv property to return fndesc.call_conv, updated function declaration to use descriptor method
numba_cuda/numba/cuda/core/imputils.py Updated user_function and user_generator to use fndesc.call_conv, added null check for status
numba_cuda/numba/cuda/tests/cudapy/test_device_func.py Added comprehensive test suite TestDeclareDeviceCABI for C ABI external device functions

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile


@skip_under_cuda_memcheck("Hangs cuda-memcheck")
@skip_on_cudasim("disabled for cudasim")
@unittest.skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skipped without explanation - check if the refactoring broke this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the multithread / multigpu test were suspected to be not tested properly in the current CI system as they are all single-GPU runners. They show up as failures in my local environment but was not related this PR.

Comment on lines 1129 to 1133
)
resty = cres.signature.return_type

if resty and not device and resty != types.void:
raise TypeError("CUDA kernel must have void return type.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed cabi_wrap_function call for device functions with C ABI. Verify this was intentional and that device functions with abi="c" still work correctly without the wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed intentional, because function / kernels can bare its own calling convention, rather than using a target-wide calling convention. cabi_wrap_function was meant to patch a function with Numba ABI to a caller that expects a CABI function. The fix in this PR is imply switching the kernel to bare a CUDACABICallConv and should work as expected.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
Bug: status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None), but this line tries to access status.is_none, which will raise AttributeError.

        if status is not None and builder.not_(status.is_none):

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI (see line 222 check and CUDACABICallConv.call_function returns status = None), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

@isVoid isVoid changed the title Move CallConv from CUDAContext to FunctionDescriptor Move CallConv from CUDAContext to FunctionDescriptor Jan 13, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None in callconv.py line 417), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

@gmarkall
Copy link
Contributor

/ok to test

@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label Jan 15, 2026
@isVoid
Copy link
Contributor Author

isVoid commented Jan 20, 2026

/ok to test 855b7de

Comment on lines 268 to 270
@property
def call_conv(self):
return CUDACallConv(self)
return self.fndesc.call_conv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can delete this property now, and all uses of call_conv should come from fndesc directly?

Copy link
Contributor Author

@isVoid isVoid Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is moved in 684ad4f. Though in many cases it's still a pass through via context.fndesc.call_conv.

@isVoid
Copy link
Contributor Author

isVoid commented Jan 22, 2026

@gmarkall I believe I addressed most of the review comments above. A few things that still stands out to me:

  • The handling of error model is still arbitrated to CUDACallConv, not CUDACABICallConv.
  • A few other functions in BaseContext like call_internal and call_internal_no_propagate have similar pattern to declare_function. Should they also get moved to FunctionDescriptor?

@isVoid isVoid requested a review from gmarkall January 22, 2026 21:50
@isVoid
Copy link
Contributor Author

isVoid commented Jan 22, 2026

/ok to test 864a40c

@gmarkall
Copy link
Contributor

The handling of error model is still arbitrated to CUDACallConv, not CUDACABICallConv.
Although this isn't as neat as I'd like, I think it's functionally OK because only the CUDACallConv can raise an exception, and the only difference between error models is where or not floating point division by zero raises an exception.

I think for Numba-CUDA I'd just get rid of the error model stuff entirely, but I think it's a bit too much upheaval right now (even for a separate PR).

A few other functions in BaseContext like call_internal and call_internal_no_propagate have similar pattern to declare_function. Should they also get moved to FunctionDescriptor?

CCCL is using call_internal: https://github.com/NVIDIA/cccl/blob/90899d560b599e1faf01412e9bffd31a57d38c31/python/cuda_cccl/cuda/compute/_odr_helpers.py#L118

I think it'd be good to eventually move them, but I think I might leave it as a TODO item here.

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to go - the only issue is it turns out that external code uses the context.call_conv (Awkward Array) so we might need to put that back for backward compatibility. Since the tests pass for all our tests, we know we have no internal uses of this left anymore.

Perhaps when reinstating it, it could raise a DeprecationWarning if used?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@isVoid
Copy link
Contributor Author

isVoid commented Jan 26, 2026

/ok to test 15dcbe5

@isVoid isVoid requested a review from gmarkall January 26, 2026 23:25
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Feb 3, 2026
@gmarkall
Copy link
Contributor

gmarkall commented Feb 3, 2026

Note to self: I need to run this locally with the changes from #770 merged to see how they interact.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Automatic reviews are disabled for this repository.

@isVoid
Copy link
Contributor Author

isVoid commented Feb 3, 2026

/ok to test b46435a

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Also, this works correctly in conjunction with #770.

@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Feb 3, 2026
@gmarkall gmarkall merged commit 0267ea1 into NVIDIA:main Feb 3, 2026
104 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Feb 5, 2026
- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests (NVIDIA#686)
- fix: fix boolean return type mismatch in C ABI wrapper (NVIDIA#770)
- Remove unused `rtapi.py`  (NVIDIA#773)
- feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode (NVIDIA#665)
- Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (NVIDIA#717)
- Generate line info for PHI exporters in terminator block (NVIDIA#756)
- Add `cuda-core` to `oldest` tests (NVIDIA#769)
- build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory (NVIDIA#768)
- Enable apt proxy caching; skip hosted Windows builds (NVIDIA#766)
- Disable automatic review trigger for Greptile (NVIDIA#743)
- test(refactor): clean up `run_in_subprocess` (NVIDIA#762)
- remove super args (NVIDIA#763)
@gmarkall gmarkall mentioned this pull request Feb 5, 2026
kkraus14 pushed a commit that referenced this pull request Feb 5, 2026
- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel
detection, and comprehensive tests (#686)
- fix: fix boolean return type mismatch in C ABI wrapper (#770)
- Remove unused `rtapi.py`  (#773)
- feat: Add documentation for debugging Numba CUDA programs with CUDA
GDB and VSCode (#665)
- Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (#717)
- Generate line info for PHI exporters in terminator block (#756)
- Add `cuda-core` to `oldest` tests (#769)
- build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the
actions-monthly group across 1 directory (#768)
- Enable apt proxy caching; skip hosted Windows builds (#766)
- Disable automatic review trigger for Greptile (#743)
- test(refactor): clean up `run_in_subprocess` (#762)
- remove super args (#763)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants