Remove C extension loading hacks #506

gmarkall · 2025-10-06T09:13:38Z

Summary

The goal is to remove the C extension loading hacks and allow them to be discovered and loaded by the Python interpreter in the normal way. This should resolve issues where the extensions are not found in certain users' setups.

This is also a step in the right direction for testing, because it ensures that we're actually testing code loaded from the built package, and not a hybrid of some of the code from the package and some of the code accidentally discovered in the source repo. The overall change to run tests from a subfolder of the repo, testing, avoids the numba_cuda folder in the root of the repo being found and used at test time in CI.

Changes

A number of distinct changes accomplish this:

Correct the name used in NUMBA_DEVICEARRAY_IMPORT_NAME (subsumes Correct NUMBA_CUDA_DEVICEARRAY_IMPORT_NAME #503).
Delete the import hacks from numba_cuda/numba/cuda/cext/__init__.py.
Move the test binary generation files out of the package and into a subfolder of the repo called testing.
Move the pytest configuration (pytest.ini, conftest.py) to testing folder. conftest.py is moved verbatim; pytest.ini contains the options previously in pyproject.toml. pytest.ini also adds --pyargs numba.cuda.tests so that running pytest with no further options does the right thing.
The creation of $RAPIDS_TEST_DIR and cding into it has been removed from the CI scripts. Whilst not directly related to this PR, it served no purpose and was confounding when the new correct location to run tests from is the testing folder.
README updates to reflect how to set up and run tests following these changes.

Additional info on the change of `NUMBA_DEVICEARRAY_IMPORT_NAME` (repeated from #503):

The package name is numba.cuda.cext, not numba_cuda. However, fixing this results in a circular import during PyCapsule_Import when running something as simple as:

from numba import cuda

which gives:

AttributeError: cannot access submodule 'cuda' of module 'numba' (most likely due to a circular import)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/__init__.py", line 73, in <module>
    from .device_init import *
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/device_init.py", line 66, in <module>
    from .decorators import jit, declare_device
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/decorators.py", line 9, in <module>
    from numba.cuda.dispatcher import CUDADispatcher
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 50, in <module>
    from numba.cuda.cext import _dispatcher
ImportError: numba.cuda.cext._devicearray failed to import

This is because when import_devicearray() is called, we're partway through importing numba.cuda. Therefore, the PyCapsule_Import() fails because it tries to access packages under numba.cuda during its initialization, which then fails due to this circularity. This was not a problem in upstream Numba because _devicearray was not in the numba.cuda package.

In order to work around this, we can get the _DEVICEARRAY_API attribute of the _devicearray module directly from its module dict, and then use PyCapsule_GetPointer() to set the DeviceArray_API global.

This addresses one of the fixups required following the merge of NVIDIA#373. The package name is `numba.cuda.cext`, not `numba_cuda`. However, fixing this results in a circular import during `PyCapsule_Import` when running something as simple as: ``` from numba import cuda ``` which gives: ``` AttributeError: cannot access submodule 'cuda' of module 'numba' (most likely due to a circular import) Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/__init__.py", line 73, in <module> from .device_init import * File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/device_init.py", line 66, in <module> from .decorators import jit, declare_device File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/decorators.py", line 9, in <module> from numba.cuda.dispatcher import CUDADispatcher File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 50, in <module> from numba.cuda.cext import _dispatcher ImportError: numba.cuda.cext._devicearray failed to import ``` This is because when `import_devicearray()` is called, we're partway through importing `numba.cuda`. Therefore, the `PyCapsule_Import()` fails because it tries to access packages under `numba.cuda` during its initialization, which then fails due to this circularity. This was not a problem in upstream Numba because `_devicearray` was not in the `numba.cuda` package. In order to work around this, we can get the `_DEVICEARRAY_API` attribute of the `_devicearray` module directly from its module dict, and then use `PyCapsule_GetPointer()` to set the `DeviceArray_API` global.

copy-pr-bot · 2025-10-06T09:13:42Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-10-06T09:13:46Z

/ok to test

…thing with Numba-CUDA

gmarkall · 2025-10-06T09:47:55Z

/ok to test

gmarkall · 2025-10-06T10:18:38Z

/ok to test

gmarkall · 2025-10-06T10:26:07Z

/ok to test

gmarkall · 2025-10-06T11:29:36Z

/ok to test

gmarkall · 2025-10-06T12:21:49Z

/ok to test

gmarkall · 2025-10-06T14:06:00Z

/ok to test

gmarkall · 2025-10-06T14:34:39Z

/ok to test

gmarkall · 2025-10-06T14:35:51Z

/ok to test

gmarkall · 2025-10-06T14:36:43Z

/ok to test

gmarkall · 2025-10-06T14:44:00Z

/ok to test

gmarkall · 2025-10-06T14:54:21Z

/ok to test

copy-pr-bot · 2025-10-07T09:14:06Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-10-07T09:48:17Z

/ok to test

kkraus14 · 2025-10-07T16:10:19Z

pyproject.toml

-[tool.pytest.ini_options]
-minversion = "8.0"
-testpaths = ["numba_cuda/numba/cuda/tests"]
-consider_namespace_packages = true
-# loadscope ensures the grouping required by CUDATestCase
-addopts = "--dist loadscope"


Why did we pull these out into a separate ini file?

I was trying to get everything related to testing in the testing subfolder, because running from the root of the repository causes the numba_cuda package in the source repo to be discovered when we're trying to test the installed wheel / conda package (which was the root of the problems requiring the C extension loading hacks). I was concerned also that relying on pyproject.toml in the root might lead to the accidental discovery of numba_cuda in the root of the repo too.

rparolin

lgtm! Happy to see unit tests getting simpler to run via the terminal.

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

gmarkall added 4 commits October 3, 2025 18:10

Move test binary generation out of numba_cuda package

dc1a54f

Try not to run from source repo

fe38f0c

Remove cext loading hacks

a301bd0

Make sure we're in the test_binary_generation subdir before doing any…

2ed8019

…thing with Numba-CUDA

gmarkall added 2 commits October 6, 2025 11:11

Move conftest into package

90725c3

test_conda only: try to fixup test discovery

5b22dad

Correct typo

7985062

gmarkall added 2 commits October 6, 2025 12:23

Put conftest.py in test_binary_generation

88cc113

Add pytest ini

7b4fee8

gmarkall added 3 commits October 6, 2025 13:17

Fix docs build

bee23ad

Fix coverage test

9d18e76

Fix wheel ctypes binding tests

9fce105

Fix test suite trying to use test binaries with ctypes binding test

9293a46

gmarkall force-pushed the testing-fixups branch from 4233023 to 5a983ca Compare October 6, 2025 14:35

Try to tidy up and simplify changes

1f084ae

gmarkall force-pushed the testing-fixups branch from 5a983ca to 1f084ae Compare October 6, 2025 14:36

Move everything to pytest.ini

ea8b607

gmarkall added 2 commits October 6, 2025 15:51

Add notes to README on testing setup

cd6aed5

Fix testing folder locations

d46d584

gmarkall changed the title ~~[WIP] Try not to run tests from source dir~~ [WIP] Remove C extension loading hacks Oct 6, 2025

gmarkall marked this pull request as ready for review October 7, 2025 09:14

gmarkall changed the title ~~[WIP] Remove C extension loading hacks~~ Remove C extension loading hacks Oct 7, 2025

gmarkall added the 3 - Ready for Review Ready for review by team label Oct 7, 2025

gmarkall mentioned this pull request Oct 7, 2025

Correct NUMBA_CUDA_DEVICEARRAY_IMPORT_NAME #503

Closed

Merge branch 'main' into testing-fixups

9639860

kkraus14 reviewed Oct 7, 2025

View reviewed changes

rparolin approved these changes Oct 7, 2025

View reviewed changes

gmarkall merged commit dcaef7c into NVIDIA:main Oct 8, 2025
76 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Remove C extension loading hacks #506

Remove C extension loading hacks #506

Uh oh!

Conversation

gmarkall commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Additional info on the change of NUMBA_DEVICEARRAY_IMPORT_NAME (repeated from #503):

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

gmarkall commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 7, 2025

Uh oh!

gmarkall commented Oct 7, 2025

Uh oh!

kkraus14 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gmarkall commented Oct 6, 2025 •

edited

Loading

Additional info on the change of `NUMBA_DEVICEARRAY_IMPORT_NAME` (repeated from #503):