Skip to content

Conversation

@VijayKandiah
Copy link
Contributor

@VijayKandiah VijayKandiah commented Aug 5, 2025

This PR vendors in the _dispatcher, _devicearray, mviewbuf C extensions for CUDA-specific changes.

@VijayKandiah VijayKandiah requested a review from gmarkall August 5, 2025 16:51
@VijayKandiah VijayKandiah self-assigned this Aug 5, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 5, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@VijayKandiah
Copy link
Contributor Author

Currently, pip install -e . does not seem to build the extension, so we have to use python setup.py build_ext --inplace to build the extension. This will probably fail CI testing until we either update CI or make pip install -e . automatically build the extension.

@VijayKandiah VijayKandiah added the 2 - In Progress Currently a work in progress label Aug 5, 2025
@gmarkall gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 6, 2025
@gmarkall
Copy link
Contributor

gmarkall commented Aug 6, 2025

Currently, pip install -e . does not seem to build the extension, so we have to use python setup.py build_ext --inplace to build the extension. This will probably fail CI testing until we either update CI or make pip install -e . automatically build the extension.

I just had a vague recollection about this - do you have setuptools > 78 installed? If so, does going back to setuptools 78 or earlier make pip install -e . build the extension?

@VijayKandiah
Copy link
Contributor Author

Currently, pip install -e . does not seem to build the extension, so we have to use python setup.py build_ext --inplace to build the extension. This will probably fail CI testing until we either update CI or make pip install -e . automatically build the extension.

I just had a vague recollection about this - do you have setuptools > 78 installed? If so, does going back to setuptools 78 or earlier make pip install -e . build the extension?

The issue was not a newer setuptools version. It was simply that is_building method in setup.py was not looking for editable_wheel as one of the arguments to enable building the C extension for. Fixed it with my recent commit.

@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah
Copy link
Contributor Author

I had to include numpy as a build requirement for setuptools.build_meta in pyproject.toml. However, I am seeing build errors in our CI with numpy not found.

ModuleNotFoundError: No module named 'numpy'
  error: subprocess-exited-with-error
  
  × Building wheel for numba-cuda (pyproject.toml) did not run successfully.
  │ exit code: 1

Is there a way to update CI to add this build requirement?

@VijayKandiah VijayKandiah added 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team 4 - Waiting on author Waiting for author to respond to review labels Aug 6, 2025
@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah VijayKandiah changed the title [WIP][Refactor] Vendor in _dispatcher C extension for CUDA-specific customization [WIP][Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization Aug 8, 2025
@VijayKandiah
Copy link
Contributor Author

My latest commit also vendors in _devicearray and mviewbuf C extension. This should encompass all the C extensions we use in numba-cuda.

I noticed a circular import issue if _devicearray cext was placed at numba_cuda.numba.cuda, and moving it to the top level numba_cuda fixed this. For uniformity, I ended up placing all cexts at numba_cuda.cext. @gmarkall please let me know your thoughts on the updated folder structure here.

@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah VijayKandiah reopened this Sep 25, 2025
@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah
Copy link
Contributor Author

This PR currently contains some import hacks in cext/__init__.py to help with our CI testing. The attempt to remove import hacks is ongoing at #451. As long as CI passes here, and everything else looks good to reviewers, I am in favor of getting this PR in and create a follow up PR for removing import hacks.

@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah VijayKandiah changed the title [WIP][Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization Sep 26, 2025
@VijayKandiah VijayKandiah added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on author Waiting for author to respond to review labels Sep 26, 2025
@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah
Copy link
Contributor Author

/ok to test

@VijayKandiah
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this, but there are two required fixups to remove the cext hacks, that need to be done in a subsequent PR.

  • Change the CI setup so we don't install the package then run tests from the repo. For a combination of reasons involving the redirector, pytest's package discovery and import logic, and the fact that it's weird to try and test an installed package when the source repo is the current directory, the cext hacks are required. Changing the setup so that tests run from the installed package without the source tree in the path should remove the need for the cext hacks.
  • Fix the capsule name for the device array, which is presently numba_cuda._devicearray, instead of numba.cuda.cext._devicearray. Simply changing the name doesn't work, because it then creates a circular import - a change to the import_devicearray() function is needed to avoid this.

A start of a fix for the circular import issue looks like:

diff --git a/numba_cuda/numba/cuda/cext/_devicearray.h b/numba_cuda/numba/cuda/cext/_devicearray.h
index e1672698..2e9df6cb 100644
--- a/numba_cuda/numba/cuda/cext/_devicearray.h
+++ b/numba_cuda/numba/cuda/cext/_devicearray.h
@@ -8,7 +8,7 @@
     extern "C" {
 #endif
 
-#define NUMBA_DEVICEARRAY_IMPORT_NAME "numba_cuda._devicearray"
+#define NUMBA_DEVICEARRAY_IMPORT_NAME "numba.cuda.cext._devicearray"
 /* These definitions should only be used by consumers of the Device Array API.
  * Consumers access the API through the opaque pointer stored in
  * _devicearray._DEVICEARRAY_API.  We don't want these definitions in
diff --git a/numba_cuda/numba/cuda/cext/_dispatcher.cpp b/numba_cuda/numba/cuda/cext/_dispatcher.cpp
index bfd3c651..653421f9 100644
--- a/numba_cuda/numba/cuda/cext/_dispatcher.cpp
+++ b/numba_cuda/numba/cuda/cext/_dispatcher.cpp
@@ -947,14 +947,22 @@ import_devicearray(void)
     if (devicearray == NULL) {
         return -1;
     }
-    Py_DECREF(devicearray);
 
-    DeviceArray_API = (void**)PyCapsule_Import(NUMBA_DEVICEARRAY_IMPORT_NAME "._DEVICEARRAY_API", 0);
-    if (DeviceArray_API == NULL) {
-        return -1;
+    PyObject *d = PyModule_GetDict(devicearray);
+    if (d == NULL) {
+      Py_DECREF(devicearray);
+      return -1;
     }
 
-    return 0;
+    PyObject *c_api = PyDict_GetItemString(d, "_DEVICEARRAY_API");
+    if (PyCapsule_IsValid(c_api, NUMBA_DEVICEARRAY_IMPORT_NAME "._DEVICEARRAY_API")) {
+      DeviceArray_API = (void**)PyCapsule_GetPointer(c_api, NUMBA_DEVICEARRAY_IMPORT_NAME "._DEVICEARRAY_API");
+      Py_DECREF(devicearray);
+      return 0;
+    } else {
+      Py_DECREF(devicearray);
+      return -1;
+    }
 }
 
 static PyMethodDef Dispatcher_methods[] = {

@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 3, 2025
@gmarkall gmarkall merged commit d8ea0d8 into NVIDIA:main Oct 3, 2025
76 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Oct 3, 2025
This addresses one of the fixups required following the merge of NVIDIA#373.

The package name is `numba.cuda.cext`, not `numba_cuda`. However, fixing
this results in a circular import during `PyCapsule_Import` when running
something as simple as:

```
from numba import cuda
```

which gives:

```
AttributeError: cannot access submodule 'cuda' of module 'numba' (most likely due to a circular import)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/__init__.py", line 73, in <module>
    from .device_init import *
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/device_init.py", line 66, in <module>
    from .decorators import jit, declare_device
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/decorators.py", line 9, in <module>
    from numba.cuda.dispatcher import CUDADispatcher
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 50, in <module>
    from numba.cuda.cext import _dispatcher
ImportError: numba.cuda.cext._devicearray failed to import
```

This is because when `import_devicearray()` is called, we're partway
through importing `numba.cuda`. Therefore, the `PyCapsule_Import()`
fails because it tries to access packages under `numba.cuda` during its
initialization, which then fails due to this circularity. This was not a
problem in upstream Numba because `_devicearray` was not in the
`numba.cuda` package.

In order to work around this, we can get the `_DEVICEARRAY_API`
attribute of the `_devicearray` module directly from its module dict,
and then use `PyCapsule_GetPointer()` to set the `DeviceArray_API`
global.
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Oct 3, 2025
This addresses one of the fixups required following the merge of NVIDIA#373.

The package name is `numba.cuda.cext`, not `numba_cuda`. However, fixing
this results in a circular import during `PyCapsule_Import` when running
something as simple as:

```
from numba import cuda
```

which gives:

```
AttributeError: cannot access submodule 'cuda' of module 'numba' (most likely due to a circular import)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/__init__.py", line 73, in <module>
    from .device_init import *
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/device_init.py", line 66, in <module>
    from .decorators import jit, declare_device
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/decorators.py", line 9, in <module>
    from numba.cuda.dispatcher import CUDADispatcher
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 50, in <module>
    from numba.cuda.cext import _dispatcher
ImportError: numba.cuda.cext._devicearray failed to import
```

This is because when `import_devicearray()` is called, we're partway
through importing `numba.cuda`. Therefore, the `PyCapsule_Import()`
fails because it tries to access packages under `numba.cuda` during its
initialization, which then fails due to this circularity. This was not a
problem in upstream Numba because `_devicearray` was not in the
`numba.cuda` package.

In order to work around this, we can get the `_DEVICEARRAY_API`
attribute of the `_devicearray` module directly from its module dict,
and then use `PyCapsule_GetPointer()` to set the `DeviceArray_API`
global.
gmarkall added a commit that referenced this pull request Oct 10, 2025
This PR vendors in `numba.core.typeconv` for CUDA-specific
customizations. The `_typeconv` C extension can be vendored in after PR
#373 gets merged in; it already
contains the necessary cpp and hpp files for `_typeconv`.

---------

Co-authored-by: Graham Markall <[email protected]>
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Nov 20, 2025
- Add support for cache-hinted load and store operations (NVIDIA#587)
- Add more thirdparty tests (NVIDIA#586)
- Add sphinx-lint to pre-commit and fix errors (NVIDIA#597)
- Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544)
- chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598)
- chore(docs): format types docs (NVIDIA#596)
- refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579)
- Fix freezing in of constant arrays with negative strides (NVIDIA#589)
- Update tests to accept variants of generated PTX (NVIDIA#585)
- refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581)
- Move frontend tests to `cudapy` namespace (NVIDIA#558)
- Generalize the concurrency group for main merges (NVIDIA#582)
- ci: move pre-commit checks to pre commit action (NVIDIA#577)
- chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574)
- ci: ensure that python version in ci matches matrix (NVIDIA#575)
- Fix the `cuda.is_supported_version()` API (NVIDIA#571)
- Fix checks on main (NVIDIA#576)
- feat: add `math.nextafter` (NVIDIA#543)
- ci: replace conda testing with pixi (NVIDIA#554)
- [CI] Run PR workflow on merge to main (NVIDIA#572)
- Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569)
- test: enable fail-on-warn and clean up resulting failures (NVIDIA#529)
- [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565)
- Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566)
- [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561)
- test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550)
- test: revert back to ipc futures that await each iteration (NVIDIA#564)
- chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551)
- [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534)
- Remove dependencies on target_extension for CUDA target (NVIDIA#555)
- Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559)
- [WIP] Port numpy reduction tests to CUDA (NVIDIA#523)
- ci: add timeout to avoid blocking the job queue (NVIDIA#556)
- Handle `cuda.core.Stream` in driver operations (NVIDIA#401)
- feat: add support for `math.exp2` (NVIDIA#541)
- Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533)
- refactor: cleanup device constructor (NVIDIA#548)
- bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547)
- perf: cache dimension computations (NVIDIA#542)
- perf: remove duplicated size computation (NVIDIA#537)
- chore(perf): add torch to benchmark (NVIDIA#539)
- test: speed up ipc tests by ~6.5x (NVIDIA#527)
- perf: speed up kernel launch (NVIDIA#510)
- perf: remove context threading in various pointer abstractions (NVIDIA#536)
- perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538)
- refactor: remove unnecessary custom map and set implementations (NVIDIA#530)
- [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513)
- test: add benchmarks for kernel launch for reproducibility (NVIDIA#528)
- test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522)
- refactor: fully remove `USE_NV_BINDING` (NVIDIA#525)
- Draft: Vendor in the IR module (NVIDIA#439)
- pyproject.toml: add search path for Pyrefly (NVIDIA#524)
- Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473)
- Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497)
- [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479)
- Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502)
- build: allow parallelization of nvcc testing builds (NVIDIA#521)
- chore(dev-deps): add pixi (NVIDIA#505)
- Vendor the imputils module for CUDA refactoring (NVIDIA#448)
- Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519)
- Switch back to stable cuDF release in thirdparty tests (NVIDIA#518)
- Updating .gitignore with binaries in the `testing` folder (NVIDIA#516)
- Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507)
- Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512)
- Vendor in typeconv for future CUDA-specific changes (NVIDIA#499)
- [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493)
- [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494)
- Make the CUDA target the default for CUDA overload decorators (NVIDIA#511)
- Remove C extension loading hacks (NVIDIA#506)
- Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437)
- [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433)
- Fix Bf16 Test OB Error (NVIDIA#509)
- Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498)
- [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373)
- [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488)
- Improve debug value range coverage (NVIDIA#461)
- Add `compile_all` API (NVIDIA#484)
- Vendor in core.registry for CUDA-specific changes (NVIDIA#485)
- [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457)
- Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476)
- [test] Remove dependency on cpu_target (NVIDIA#490)
- Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475)
- [test] Use numpy's tolerance for float16 (NVIDIA#491)
- [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466)
- [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)
@gmarkall gmarkall mentioned this pull request Nov 20, 2025
gmarkall added a commit that referenced this pull request Nov 20, 2025
- Add support for cache-hinted load and store operations (#587)
- Add more thirdparty tests (#586)
- Add sphinx-lint to pre-commit and fix errors (#597)
- Add DWARF variant part support for polymorphic variables in CUDA debug
info (#544)
- chore: clean up dead workaround for unavailable `lru_cache` (#598)
- chore(docs): format types docs (#596)
- refactor: decouple `Context` from `Stream` and `Event` objects (#579)
- Fix freezing in of constant arrays with negative strides (#589)
- Update tests to accept variants of generated PTX (#585)
- refactor: replace device functionality with `cuda.core` APIs (#581)
- Move frontend tests to `cudapy` namespace (#558)
- Generalize the concurrency group for main merges (#582)
- ci: move pre-commit checks to pre commit action (#577)
- chore(pixi): set up doc builds; remove most `build-conda` dependencies
(#574)
- ci: ensure that python version in ci matches matrix (#575)
- Fix the `cuda.is_supported_version()` API (#571)
- Fix checks on main (#576)
- feat: add `math.nextafter` (#543)
- ci: replace conda testing with pixi (#554)
- [CI] Run PR workflow on merge to main (#572)
- Propose Alternative Module Path for `ext_types` and Maintain
`numba.cuda.types.bfloat16` Import API (#569)
- test: enable fail-on-warn and clean up resulting failures (#529)
- [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific
changes (#565)
- Fix registration with Numba, vendor MakeFunctionToJITFunction tests
(#566)
- [Refactor][NFC][Cleanups] Update imports to upstream numba to use the
numba.cuda modules (#561)
- test: refactor process-based tests to use concurrent futures in order
to simplify tests (#550)
- test: revert back to ipc futures that await each iteration (#564)
- chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi
environments (#551)
- [Refactor][NFC] Vendor-in errors for future CUDA-specific changes
(#534)
- Remove dependencies on target_extension for CUDA target (#555)
- Relax the pinning to `cuda-core` to allow it floating across minor
releases (#559)
- [WIP] Port numpy reduction tests to CUDA (#523)
- ci: add timeout to avoid blocking the job queue (#556)
- Handle `cuda.core.Stream` in driver operations (#401)
- feat: add support for `math.exp2` (#541)
- Vendor in types and datamodel for CUDA-specific changes (#533)
- refactor: cleanup device constructor (#548)
- bench: add cupy to array constructor kernel launch benchmarks (#547)
- perf: cache dimension computations (#542)
- perf: remove duplicated size computation (#537)
- chore(perf): add torch to benchmark (#539)
- test: speed up ipc tests by ~6.5x (#527)
- perf: speed up kernel launch (#510)
- perf: remove context threading in various pointer abstractions (#536)
- perf: reduce the number of `__cuda_array_interface__` accesses (#538)
- refactor: remove unnecessary custom map and set implementations (#530)
- [Refactor][NFC] Vendor-in vectorize decorators for future
CUDA-specific changes (#513)
- test: add benchmarks for kernel launch for reproducibility (#528)
- test(pixi): update pixi testing command to work with the new `testing`
directory (#522)
- refactor: fully remove `USE_NV_BINDING` (#525)
- Draft: Vendor in the IR module (#439)
- pyproject.toml: add search path for Pyrefly (#524)
- Vendor in numba.core.typing for CUDA-specific changes (#473)
- Use numba.config when available, otherwise use numba.cuda.config
(#497)
- [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and
cuda.bindings as fallback (#479)
- Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific
changes (#502)
- build: allow parallelization of nvcc testing builds (#521)
- chore(dev-deps): add pixi (#505)
- Vendor the imputils module for CUDA refactoring (#448)
- Don't use `MemoryLeakMixin` for tests that don't use NRT (#519)
- Switch back to stable cuDF release in thirdparty tests (#518)
- Updating .gitignore with binaries in the `testing` folder (#516)
- Remove some unnecessary uses of ContextResettingTestCase (#507)
- Vendor in _helperlib cext for CUDA-specific changes (#512)
- Vendor in typeconv for future CUDA-specific changes (#499)
- [Refactor][NFC] Vendor-in numba.cpython modules for future
CUDA-specific changes (#493)
- [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific
changes (#494)
- Make the CUDA target the default for CUDA overload decorators (#511)
- Remove C extension loading hacks (#506)
- Ensure NUMBA can manipulate memory from CUDA graphs before the graph
is launched (#437)
- [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific
changes (#433)
- Fix Bf16 Test OB Error (#509)
- Vendor in components from numba.core.runtime for CUDA-specific changes
(#498)
- [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension
for CUDA-specific customization (#373)
- [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2
(#488)
- Improve debug value range coverage (#461)
- Add `compile_all` API (#484)
- Vendor in core.registry for CUDA-specific changes (#485)
- [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457)
- Vendor in optional, boxing for CUDA-specific changes, fix dangling
imports (#476)
- [test] Remove dependency on cpu_target (#490)
- Change dangling imports of numba.core.lowering to numba.cuda.lowering
(#475)
- [test] Use numpy's tolerance for float16 (#491)
- [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific
changes (#466)
- [Refactor][NFC] Vendor-in more cpython registries for future
CUDA-specific changes (#478)

<!--

Thank you for contributing to numba-cuda :)

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
work on top of it, then remove the `[WIP]` label (if present) and
replace
it with `[REVIEW]`. If assistance is required to complete the
functionality,
for example when the C/C++ code of a feature is complete but Python
bindings
are still required, then add the label `[HELP-REQ]` so that others can
triage
and assist. The additional changes then can be implemented on top of the
same PR. If the assistance is done by members of the rapidsAI team, then
no
additional actions are required by the creator of the original PR for
this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If
that
doesn't happen then a new PR based on the code of the original PR can be
opened by the person assisting, which then will be the PR that will be
   merged.

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on main/force push/rewrite history, doing any of
these
   causes the context of any comments made by reviewers to be lost. If
   conflicts occur against main they should be resolved by merging main
   into the branch used for making the pull request.

Many thanks in advance for your cooperation!

-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants