Skip to content

Conversation

@brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Oct 24, 2025

Part of #471

  • Adds a DeprecatedNDArrayAPIWarning emitted from all user facing functions for moving data around (cuda.to_device, driver.host_to_device, device_to_host, also as_cuda_array, is_cuda_array, etc
  • Separates existing now deprecated APIs into internal non-warning versions and external warning versions
  • Adds a deprecation warning to the DeviceNDArray ctor
  • Adds DeviceNDArray._create_nowarn
  • Removes as many usages of the deprecated APIs as possible from the test suite in favor of cupy arrays
  • Catches warnings for tests of the currently exposed and now deprecated APIs
  • Where absolutely necessary, tests calls internal non-warning versions of the deprecated APIs
  • Rework tests to not use these apis as much as possible

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@functools.wraps(func)
def wrapper(*args, **kwargs):
warnings.warn(
f"{func.__name__} api is deprecated. Please prefer cupy for array functions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy arrays are much slower than DeviceNDArray because they require creating an external (i.e., non-numba-cuda-created) stream, so I'm not sure a recommendation for that is what we should do right now.

I was thinking that we can keep the top-level APIs (device_array etc.) and replace their internals with StridedMemoryView or something similar, in an effort to allow folks to as-cheaply-as-possible construct arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the current state of the art:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur that a light weight device array like container should exist, I'm just not sure that numba-cuda should necessarily be the library providing it publicly. I think we should nudge users away from using numba-cuda as such, like for moving data from host to device. That said, I'm open to suggestions on what we should recommend.

@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Oct 24, 2025
@rparolin rparolin added this to the next milestone Oct 24, 2025
@brandon-b-miller
Copy link
Contributor Author

/ok to test

@brandon-b-miller brandon-b-miller marked this pull request as ready for review January 5, 2026 16:29
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 5, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

This PR deprecates the DeviceNDArray class and all public APIs for host-side device array operations (to_device, device_array, as_cuda_array, etc.) in favor of CuPy, addressing issue #471.

Key Changes:

  • Introduces DeprecatedDeviceArrayApiWarning with @deprecated_array_api decorator for public APIs
  • Separates deprecated public APIs in api.py from internal non-warning implementations in _api.py
  • Adds DeviceNDArray._create_nowarn() factory method for internal use
  • Updates 17+ test files to use new DeprecatedDeviceArrayApiTest base class that suppresses warnings
  • Adds CuPy as test dependency and updates documentation with deprecation notices
  • Systematically replaces direct DeviceNDArray() constructor calls with _create_nowarn() throughout codebase

Issues Found:

  • reduction.py:262-264 introduces unnecessary complexity by converting already-sliceable device arrays through __cuda_array_interface__

Confidence Score: 4/5

  • Safe to merge with one logic issue in reduction.py that should be addressed
  • The deprecation infrastructure is well-designed with clear separation between public/internal APIs. Test coverage is comprehensive with proper warning suppression. However, the reduction.py change introduces unnecessary complexity that could impact performance and should be simplified before merging.
  • Pay close attention to numba_cuda/numba/cuda/kernels/reduction.py - the result handling logic was unnecessarily complicated

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/cudadrv/devicearray.py Adds deprecation infrastructure: DeprecatedDeviceArrayApiWarning, deprecated_array_api decorator, DeviceNDArray._create_nowarn() factory method, and marks public methods (split, squeeze, view, get_ipc_handle) as deprecated
numba_cuda/numba/cuda/api.py Wraps all public device array APIs (to_device, device_array, managed_array, pinned_array, mapped_array, as_cuda_array, is_cuda_array) with deprecation warnings, delegates to internal _api module implementations
numba_cuda/numba/cuda/_api.py Introduces internal non-warning implementations (_from_cuda_array_interface, _as_cuda_array, _is_cuda_array, _to_device, _device_array, etc.) for use within the library without triggering deprecation warnings
numba_cuda/numba/cuda/testing.py Adds DeprecatedDeviceArrayApiTest base class that automatically suppresses DeprecatedDeviceArrayApiWarning in setUp/tearDown for tests that need to use deprecated APIs
numba_cuda/numba/cuda/kernels/reduction.py Replaces cuda.device_array() with _api._device_array(), modifies result handling to use _from_cuda_array_interface() - complex change to slicing logic that may need verification
docs/source/user/memory.rst Adds deprecation notes recommending CuPy for all device array operations including memory transfers, pinned/mapped/managed memory

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (15)

  1. numba_cuda/numba/cuda/kernels/transpose.py, line 24-25 (link)

    style: The deprecation message mentions 'transpose method' but this function is not a method - it's a standalone function. Consider rewording to 'transpose function and DeviceNDArray class are deprecated.'

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. numba_cuda/numba/cuda/tests/cudadrv/test_profiler.py, line 19 (link)

    style: Array size changed from 100 to 10, should this be intentional? Was the change from 100 to 10 elements intentional, or should this remain 100to preserve the original test's different allocation sizes?

  3. numba_cuda/numba/cuda/api.py, line 69-70 (link)

    logic: Dead code: line 70 is unreachable after the return statement on line 69

  4. numba_cuda/numba/cuda/vectorizers.py, line 121 (link)

    logic: This line still uses the deprecated cuda.as_cuda_array instead of _api._as_cuda_array like line 186

  5. numba_cuda/numba/cuda/tests/cudapy/test_random.py, line 22 (link)

    style: CuPy is imported but never used. Either remove the unused import or complete the migration to use CuPy arrays instead of deprecated DeviceNDArray APIs. Is this import intended for future work, or should the migration to CuPy arrays be completed in this PR?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  6. numba_cuda/numba/cuda/testing.py, line 197 (link)

    logic: Using warnings.resetwarnings() clears all warning filters, not just the ones added in setUp. This could affect other tests running in the same process. Should this use a more targeted approach to only reset the specific filter added in setUp?

  7. numba_cuda/numba/cuda/tests/doc_examples/test_reduction.py, line 73 (link)

    style: This line accesses a[0] directly on the GPU array, which works with CuPy but the assertion on line 77 uses a.get()[0]. Consider using a[0].get() here for consistency.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  8. numba_cuda/numba/cuda/_api.py, line 324-330 (link)

    logic: This function calls the public device_array instead of the internal _device_array, which will emit deprecation warnings when used internally

    Should this call _device_array to avoid deprecation warnings when used internally?

  9. numba_cuda/numba/cuda/_api.py, line 340-348 (link)

    logic: This function calls the public mapped_array instead of the internal _mapped_array, which will emit deprecation warnings when used internally

    Should this call _mapped_array to avoid deprecation warnings when used internally?

  10. numba_cuda/numba/cuda/_api.py, line 358-360 (link)

    logic: This function calls the public pinned_array instead of the internal _pinned_array, which will emit deprecation warnings when used internally

    Should this call _pinned_array to avoid deprecation warnings when used internally?

  11. numba_cuda/numba/cuda/tests/cudapy/test_multithreads.py, line 66 (link)

    style: inconsistent with migration goals - still uses deprecated cuda.to_device(). Are these methods intentionally testing the deprecated API, or should they also be migrated to CuPy?

  12. numba_cuda/numba/cuda/tests/cudapy/test_multithreads.py, line 75-76 (link)

    style: inconsistent with migration goals - still uses deprecated cuda.to_device(). Should these test methods also migrate to CuPy arrays for consistency?

  13. numba_cuda/numba/cuda/tests/doc_examples/test_globals.py, line 47 (link)

    logic: cp.asarray(5, dtype=np.float64) creates a scalar array with value 5, not a 5-element array. Should be cp.zeros(5, dtype=np.float64) to match original behavior.

  14. numba_cuda/numba/cuda/tests/cudapy/test_vectorize.py, line 89 (link)

    logic: Line 89 creates a CUDA stream that is never used - it's immediately overridden on line 94

  15. numba_cuda/numba/cuda/tests/cudapy/test_vectorize.py, line 94-96 (link)

    logic: Stream creation moved outside the loop but needs to be inside for proper isolation between test iterations. Should each iteration use a separate stream for proper test isolation?

74 files reviewed, 15 comments

Edit Code Review Agent Settings | Greptile

@brandon-b-miller
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. numba_cuda/numba/cuda/cudadrv/devicearray.py, line 773-779 (link)

    logic: type(self)( will trigger deprecation warning during internal operations

  2. numba_cuda/numba/cuda/_api.py, line 205-208 (link)

    logic: Internal _pinned_array should not emit deprecation warnings. Remove this warning since this is the internal implementation

  3. numba_cuda/numba/cuda/_api.py, line 241-244 (link)

    logic: Internal _mapped_array should not emit deprecation warnings. Remove this warning since this is the internal implementation

74 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor changes requested, but overall LGTM.

xoroshiro128p_normal_float64,
)

import cupy as cp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this import needed here? It doesn't appear to be used and nothing else was changed.

Comment on lines 278 to 279
setattr(driver, "cuMemcpyHtoD", raising_transfer)
setattr(driver, "cuMemcpyDtoH", raising_transfer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using setattr here? We know the attribute name, so there's no obvious reason to set this attribute dynamically.

Suggested change
setattr(driver, "cuMemcpyHtoD", raising_transfer)
setattr(driver, "cuMemcpyDtoH", raising_transfer)
driver.cuMemcpyHtoD = driver.cuMemcpyDtoH = raising_transfer

else:
del driver.cuMemcpyHtoD
if self.old_DtoH is not None:
setattr(driver, "cuMemcpyDtoH", self.old_DtoH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setattr(driver, "cuMemcpyDtoH", self.old_DtoH)
driver.cuMemcpyDtoH = self.old_DtoH

old_DtoH = getattr(driver, "cuMemcpyDtoH", None)
def tearDown(self):
if self.old_HtoD is not None:
setattr(driver, "cuMemcpyHtoD", self.old_HtoD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setattr(driver, "cuMemcpyHtoD", self.old_HtoD)
driver.cuMemcpyHtoD = self.old_HtoD

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (10)

  1. numba_cuda/numba/cuda/tests/cudapy/test_random.py, line 22 (link)

    style: CuPy import is unused in this file. Is this import intended for future use, or should it be removed until needed?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. numba_cuda/numba/cuda/tests/doc_examples/test_laplace.py, line 67 (link)

    logic: This still calls copy_to_host() on a CuPy array, but CuPy arrays don't have this method - should use .get() instead

  3. numba_cuda/numba/cuda/_api.py, line 205-208 (link)

    logic: Internal function _pinned_array should not emit deprecation warnings - this defeats the purpose of having separate internal implementations

  4. numba_cuda/numba/cuda/_api.py, line 241-244 (link)

    logic: Internal function _mapped_array should not emit deprecation warnings - this defeats the purpose of having separate internal implementations

  5. numba_cuda/numba/cuda/tests/cudapy/test_gufunc.py, line 130 (link)

    logic: This cuda.device_array call should also be wrapped in the warning context since it's a deprecated API

  6. numba_cuda/numba/cuda/tests/cudapy/test_device_array_capture.py, line 23 (link)

    style: Function name and docstring are now misleading - creates CuPy array, not Numba device array

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  7. numba_cuda/numba/cuda/tests/cudapy/test_device_array_capture.py, line 28 (link)

    style: Docstring refers to 'Numba device array' but now wraps CuPy array

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  8. numba_cuda/numba/cuda/vectorizers.py, line 121 (link)

    style: Inconsistency: this uses public cuda.as_cuda_array() while line186 uses internal _api._as_cuda_array() for the same operation

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  9. numba_cuda/numba/cuda/tests/cudapy/test_gufunc_scalar.py, line 126 (link)

    syntax: Missing import for pytest module which is required for pytest.warns()

  10. numba_cuda/numba/cuda/cudadrv/devicearray.py, line 773-779 (link)

    logic: Using deprecated constructor will trigger warnings during internal setitem operations. Should use type(self)._create_nowarn(

74 files reviewed, 10 comments

Edit Code Review Agent Settings | Greptile

@brandon-b-miller
Copy link
Contributor Author

/ok to test

@brandon-b-miller
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@brandon-b-miller
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (5)

  1. numba_cuda/numba/cuda/tests/cudapy/test_transpose.py, line 9 (link)

    style: CuPy import is unused in this file

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. numba_cuda/numba/cuda/tests/cudadrv/test_cuda_devicerecord.py, line 116 (link)

    style: Should use super().setUp() for consistency with line 42

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. numba_cuda/numba/cuda/tests/cudapy/test_gufunc_scalar.py, line 58 (link)

    logic: inconsistent migration - dev_out2 still uses deprecated copy_to_host() method while dev_out1 was migrated to CuPy's .get(). Should be out2 = dev_out2.get()

  4. numba_cuda/numba/cuda/vectorizers.py, line 119 (link)

    style: Line uses public cuda._api._is_cuda_ndarray but should use internal _api._is_cuda_ndarray for consistency

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  5. numba_cuda/numba/cuda/vectorizers.py, line 184 (link)

    style: Still using deprecated public cuda.cudadrv.devicearray.is_cuda_ndarray - should use _api._is_cuda_ndarray

74 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. numba_cuda/numba/cuda/api.py, line 148-154 (link)

    style: duplicates _api._device_array() - consider importing from _api instead to reduce code duplication

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. numba_cuda/numba/cuda/api.py, line 293-320 (link)

    style: duplicates _api._contiguous_strides_like_array() - consider importing from _api module

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

74 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@brandon-b-miller
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/kernels/reduction.py, line 262-264 (link)

    logic: Unnecessary conversion here - res is already a device array with slicing support. The original res[:1].copy_to_device(partials[:1], stream=stream) was simpler and more efficient.

74 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.



class TestPinned(CUDATestCase):
# TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: is there a specific todo here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - In Progress Currently a work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants