Move `CallConv` from `CUDAContext` to `FunctionDescriptor` #717

isVoid · 2026-01-13T17:55:11Z

Today, calling conventions are defined globally per compilation context. This makes it hard to switch flexibly between the Numba ABI and the C ABI when declaring external functions. It also explains the need for the kernel “fixup” logic: CUDA kernels are fundamentally C-ABI, but have historically been forced through the Numba ABI path.

This PR moves calling-convention selection to a more granular level, removing these limitations and eliminating the kernel fixup workaround. It also lays the groundwork for users to plug in additional calling-convention implementations in the future.

copy-pr-bot · 2026-01-13T17:55:15Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2026-01-13T17:59:25Z

Greptile Overview

Greptile Summary

This PR refactors calling convention management by moving it from the global CUDAContext to individual FunctionDescriptor instances. This enables per-function ABI selection and eliminates the need for kernel fixup workarounds.

Key Changes:

Moved CUDACallConv and CUDACABICallConv classes from target.py to callconv.py
Added call_conv and abi_info fields to FunctionDescriptor and its subclasses
Moved declare_function method from BaseContext to FunctionDescriptor
Removed cabi_wrap_function - functions now directly use their specified calling convention
Updated all call sites from context.call_conv to fndesc.call_conv (or context.fndesc.call_conv)
Added abi parameter to declare_device function enabling C ABI external functions
Added comprehensive test coverage for C ABI device functions
Deprecated context.call_conv property with warning

Impact:

Functions can now independently specify Numba ABI or C ABI
External device functions with C ABI are properly supported without wrapper generation
Fixed test_device_function_with_debug which previously required expectedFailure decorator
31 files modified with clean migration pattern across the codebase

Confidence Score: 5/5

This PR is safe to merge - well-structured refactoring with comprehensive test coverage and proper deprecation handling
The refactoring follows a clear, consistent pattern across all 31 files. The architecture change is well-motivated and properly tested with new C ABI device function tests. The removal of cabi_wrap_function is intentional and correct. All call sites have been systematically updated from context.call_conv to fndesc.call_conv. The deprecated context.call_conv property provides a clear migration path.
No files require special attention

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/core/funcdesc.py	Added `call_conv` and `abi_info` fields to `FunctionDescriptor`, moved `declare_function` method from context to descriptor
numba_cuda/numba/cuda/core/callconv.py	Moved `CUDACallConv` and `CUDACABICallConv` from target.py, added `mangler` method to `BaseCallConv`
numba_cuda/numba/cuda/target.py	Removed calling convention classes (moved to callconv.py), deprecated `context.call_conv` property
numba_cuda/numba/cuda/compiler.py	Removed `cabi_wrap_function`, updated `compile_cuda` and `declare_device_function` to handle ABI at function level
numba_cuda/numba/cuda/flags.py	Added `call_conv` and `abi_info` options to `CUDAFlags`
numba_cuda/numba/cuda/core/typed_passes.py	Creates `CUDACallConv` from flags or defaults, passes to function descriptor during lowering
numba_cuda/numba/cuda/core/base.py	Removed `declare_function` method, updated to use `fndesc.call_conv` and `fndesc.declare_function`
numba_cuda/numba/cuda/lowering.py	Changed `call_conv` property to return `fndesc.call_conv`, updated function declaration to use descriptor method
numba_cuda/numba/cuda/core/imputils.py	Updated `user_function` and `user_generator` to use `fndesc.call_conv`, added null check for status
numba_cuda/numba/cuda/tests/cudapy/test_device_func.py	Added comprehensive test suite `TestDeclareDeviceCABI` for C ABI external device functions

greptile-apps

_{4 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

numba_cuda/numba/cuda/tests/cudadrv/test_reset_device.py

greptile-apps · 2026-01-13T17:59:33Z

numba_cuda/numba/cuda/tests/cudapy/test_multithreads.py


 @skip_under_cuda_memcheck("Hangs cuda-memcheck")
 @skip_on_cudasim("disabled for cudasim")
+@unittest.skip


Test skipped without explanation - check if the refactoring broke this test

Some of the multithread / multigpu test were suspected to be not tested properly in the current CI system as they are all single-GPU runners. They show up as failures in my local environment but was not related this PR.

greptile-apps · 2026-01-13T17:59:34Z

numba_cuda/numba/cuda/compiler.py

    )
    resty = cres.signature.return_type

    if resty and not device and resty != types.void:
        raise TypeError("CUDA kernel must have void return type.")


Removed cabi_wrap_function call for device functions with C ABI. Verify this was intentional and that device functions with abi="c" still work correctly without the wrapper.

This is indeed intentional, because function / kernels can bare its own calling convention, rather than using a target-wide calling convention. cabi_wrap_function was meant to patch a function with Numba ABI to a caller that expects a CABI function. The fix in this PR is imply switching the kernel to bare a CUDACABICallConv and should work as expected.

greptile-apps · 2026-01-13T17:59:37Z

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
Bug: status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None), but this line tries to access status.is_none, which will raise AttributeError.

        if status is not None and builder.not_(status.is_none):

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-13T18:13:12Z

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI (see line 222 check and CUDACABICallConv.call_function returns status = None), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-13T18:17:52Z

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None in callconv.py line 417), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

gmarkall · 2026-01-15T13:08:07Z

/ok to test

…-callconv

isVoid · 2026-01-20T16:53:14Z

/ok to test 855b7de

numba_cuda/numba/cuda/compiler.py

numba_cuda/numba/cuda/core/base.py

numba_cuda/numba/cuda/core/imputils.py

numba_cuda/numba/cuda/core/typed_passes.py

numba_cuda/numba/cuda/decorators.py

gmarkall · 2026-01-20T21:54:48Z

numba_cuda/numba/cuda/target.py

+    @property
    def call_conv(self):
-        return CUDACallConv(self)
+        return self.fndesc.call_conv


Hopefully we can delete this property now, and all uses of call_conv should come from fndesc directly?

Yes, this is moved in 684ad4f. Though in many cases it's still a pass through via context.fndesc.call_conv.

numba_cuda/numba/cuda/target.py

numba_cuda/numba/cuda/tests/cudapy/test_compiler.py

numba_cuda/numba/cuda/tests/cudapy/test_device_func.py

isVoid · 2026-01-22T21:48:13Z

@gmarkall I believe I addressed most of the review comments above. A few things that still stands out to me:

The handling of error model is still arbitrated to CUDACallConv, not CUDACABICallConv.
A few other functions in BaseContext like call_internal and call_internal_no_propagate have similar pattern to declare_function. Should they also get moved to FunctionDescriptor?

isVoid · 2026-01-22T21:56:32Z

/ok to test 864a40c

numba_cuda/numba/cuda/core/compiler.py

numba_cuda/numba/cuda/core/generators.py

gmarkall · 2026-01-26T13:27:29Z

The handling of error model is still arbitrated to CUDACallConv, not CUDACABICallConv.
Although this isn't as neat as I'd like, I think it's functionally OK because only the CUDACallConv can raise an exception, and the only difference between error models is where or not floating point division by zero raises an exception.

I think for Numba-CUDA I'd just get rid of the error model stuff entirely, but I think it's a bit too much upheaval right now (even for a separate PR).

A few other functions in BaseContext like call_internal and call_internal_no_propagate have similar pattern to declare_function. Should they also get moved to FunctionDescriptor?

CCCL is using call_internal: https://github.com/NVIDIA/cccl/blob/90899d560b599e1faf01412e9bffd31a57d38c31/python/cuda_cccl/cuda/compute/_odr_helpers.py#L118

I think it'd be good to eventually move them, but I think I might leave it as a TODO item here.

gmarkall

I think this looks good to go - the only issue is it turns out that external code uses the context.call_conv (Awkward Array) so we might need to put that back for backward compatibility. Since the tests pass for all our tests, we know we have no internal uses of this left anymore.

Perhaps when reinstating it, it could raise a DeprecationWarning if used?

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

isVoid · 2026-01-26T23:25:19Z

/ok to test 15dcbe5

gmarkall · 2026-02-03T12:48:29Z

Note to self: I need to run this locally with the changes from #770 merged to see how they interact.

…-callconv

greptile-apps · 2026-02-03T19:37:28Z

Automatic reviews are disabled for this repository.

isVoid · 2026-02-03T19:37:49Z

/ok to test b46435a

gmarkall

Looks good. Also, this works correctly in conjunction with #770.

- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests (NVIDIA#686) - fix: fix boolean return type mismatch in C ABI wrapper (NVIDIA#770) - Remove unused `rtapi.py` (NVIDIA#773) - feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode (NVIDIA#665) - Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (NVIDIA#717) - Generate line info for PHI exporters in terminator block (NVIDIA#756) - Add `cuda-core` to `oldest` tests (NVIDIA#769) - build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory (NVIDIA#768) - Enable apt proxy caching; skip hosted Windows builds (NVIDIA#766) - Disable automatic review trigger for Greptile (NVIDIA#743) - test(refactor): clean up `run_in_subprocess` (NVIDIA#762) - remove super args (NVIDIA#763)

- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests (#686) - fix: fix boolean return type mismatch in C ABI wrapper (#770) - Remove unused `rtapi.py` (#773) - feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode (#665) - Move `CallConv` from `CUDAContext` to `FunctionDescriptor` (#717) - Generate line info for PHI exporters in terminator block (#756) - Add `cuda-core` to `oldest` tests (#769) - build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory (#768) - Enable apt proxy caching; skip hosted Windows builds (#766) - Disable automatic review trigger for Greptile (#743) - test(refactor): clean up `run_in_subprocess` (#762) - remove super args (#763)

isVoid added 4 commits December 18, 2025 09:59

checkpointing 121825

bcee996

checkpointing 010626

a988ca3

initial

1419998

Merge remote-tracking branch 'origin' into experimental-callconv

250886f

greptile-apps bot reviewed Jan 13, 2026

View reviewed changes

remove stale skip

e60bf0f

greptile-apps bot reviewed Jan 13, 2026

View reviewed changes

remove cabi_wrap_function

72f8f8e

isVoid changed the title ~~Move CallConv from CUDAContext to FunctionDescriptor~~ Move CallConv from CUDAContext to FunctionDescriptor Jan 13, 2026

greptile-apps bot reviewed Jan 13, 2026

View reviewed changes

gmarkall added the 3 - Ready for Review Ready for review by team label Jan 15, 2026

Merge branch 'main' of github.com:NVIDIA/numba-cuda into experimental…

855b7de

…-callconv