Skip to content

Conversation

@kaeun97
Copy link
Contributor

@kaeun97 kaeun97 commented Nov 27, 2025

Fixes #483.

As mentioned here, the bug was due to:

  • empty arrays incorrectly identified as broadcast arrays
  • incorrect contiguity flags for empty arrays

When running this:

from numba import cuda, typeof
import numpy as np

h_values = np.random.randint(low=0, high=2, size=(10, 0))
d_values = cuda.to_device(h_values)

print(typeof(h_values))
print(typeof(d_values))

assert typeof(h_values) == typeof(d_values)

Below is the output:

array(int64, 2d, C)
array(int64, 2d, C)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kaeun97 kaeun97 force-pushed the kaeun97/fix-empty-device-array branch from f0ea6c7 to 098fbe0 Compare November 27, 2025 01:10
@kaeun97 kaeun97 changed the title fix: 0-size arrays are contiguous fix: Pass NULL pointer for cffi.FFI().from_buffer(empty array) Nov 27, 2025
@kaeun97 kaeun97 changed the title fix: Pass NULL pointer for cffi.FFI().from_buffer(empty array) fix: empty array type mismatch between host and device Nov 27, 2025
@kaeun97 kaeun97 marked this pull request as ready for review November 27, 2025 01:22
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 27, 2025

Greptile Overview

Greptile Summary

Fixed type mismatch between host and device empty arrays by correctly identifying empty arrays as contiguous and preventing them from being misclassified as broadcast arrays.

  • Added early check in dummyarray._compute_layout() to return all 0-size arrays as both C and F contiguous
  • Modified broadcast array detection in devicearray.__type_name__() to exclude empty arrays (size == 0) from being treated as broadcast
  • Added comprehensive test coverage with 8 different empty array shapes and type inference validation

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are minimal, well-targeted, and follow NumPy's behavior for empty arrays. The fix addresses two specific edge cases with simple boolean conditions, includes comprehensive test coverage across multiple empty array shapes, and aligns with the referenced NumPy implementation
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
numba_cuda/numba/cuda/cudadrv/devicearray.py 5/5 Fixed broadcast array detection to exclude empty arrays by adding (self.size != 0) condition
numba_cuda/numba/cuda/cudadrv/dummyarray.py 5/5 Added early return to mark all 0-size arrays as contiguous before broadcast check
numba_cuda/numba/cuda/tests/nocuda/test_dummyarray.py 5/5 Added comprehensive test coverage for empty array contiguity flags and type inference

Sequence Diagram

sequenceDiagram
    participant User
    participant NumPy
    participant cuda.to_device
    participant DeviceNDArrayBase
    participant Array as dummyarray.Array
    participant typeof

    User->>NumPy: Create empty array (e.g., shape=(10, 0))
    NumPy-->>User: Returns array with size=0, strides may contain 0
    
    User->>cuda.to_device: Transfer array to device
    cuda.to_device->>DeviceNDArrayBase: Create device array
    
    DeviceNDArrayBase->>Array: _compute_layout() to set flags
    Note over Array: Check if self.size == 0
    Array-->>DeviceNDArrayBase: Return {C_CONTIGUOUS: True, F_CONTIGUOUS: True}
    
    User->>typeof: Get type of device array
    typeof->>DeviceNDArrayBase: Call __type_name__()
    Note over DeviceNDArrayBase: broadcast = 0 in strides AND size != 0
    Note over DeviceNDArrayBase: For empty arrays: broadcast = False
    DeviceNDArrayBase->>DeviceNDArrayBase: Check flags["C_CONTIGUOUS"]
    DeviceNDArrayBase-->>typeof: Return Array(dtype, ndim, 'C')
    
    User->>User: Compare host and device types
    Note over User: Types now match: array(dtype, ndim, C)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@gmarkall
Copy link
Contributor

/ok to test e4e1b57

@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label Nov 27, 2025
Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the PR! I think the tests look great, and the fix moves things in the right direction without regressing any functionality.

Whilst examining your changes, I did become quite suspicious of the way we're handling contiguity computation (see comment on the diff). I began to believe that if we can correctly compute contiguity, then we shouldn't need to be special-casing the computation of _numba_type_ in devicearray.py on whether the array is broadcasted or not.

Furthermore, having a zero stride shouldn't also preclude contiguity (from reading the implementation and comments on the NumPy version of this functionality). It made me feel that we ought to be able to apply:

diff --git a/numba_cuda/numba/cuda/cudadrv/devicearray.py b/numba_cuda/numba/cuda/cudadrv/devicearray.py
index 188b2f5a..cfd25b5a 100644
--- a/numba_cuda/numba/cuda/cudadrv/devicearray.py
+++ b/numba_cuda/numba/cuda/cudadrv/devicearray.py
@@ -178,11 +178,9 @@ class DeviceNDArrayBase(_devicearray.DeviceArray):
         # of which will be 0, will not match those hardcoded in for 'C' or 'F'
         # layouts.
 
-        broadcast = 0 in self.strides and (self.size != 0)
-
-        if self.flags["C_CONTIGUOUS"] and not broadcast:
+        if self.flags["C_CONTIGUOUS"]:
             layout = "C"
-        elif self.flags["F_CONTIGUOUS"] and not broadcast:
+        elif self.flags["F_CONTIGUOUS"]:
             layout = "F"
         else:
             layout = "A"
diff --git a/numba_cuda/numba/cuda/cudadrv/dummyarray.py b/numba_cuda/numba/cuda/cudadrv/dummyarray.py
index 16545954..8b0515c4 100644
--- a/numba_cuda/numba/cuda/cudadrv/dummyarray.py
+++ b/numba_cuda/numba/cuda/cudadrv/dummyarray.py
@@ -275,19 +275,11 @@ class Array(object):
         # 13661ac70).
         # https://github.com/numpy/numpy/blob/maintenance/1.19.x/numpy/core/src/multiarray/flagsobject.c#L123-L191
 
+        flags = {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
+
         # Records have no dims, and we can treat them as contiguous
         if not self.dims:
-            return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
-
-        # All 0-size arrays are considered contiguous, even if they are multidimensional
-        if self.size == 0:
-            return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
-
-        # If this is a broadcast array then it is not contiguous
-        if any([dim.stride == 0 for dim in self.dims]):
-            return {"C_CONTIGUOUS": False, "F_CONTIGUOUS": False}
-
-        flags = {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
+            return flags
 
         # Check C contiguity
         sd = self.itemsize

to allow broadcasted arrays to also be considered contiguous. If we didn't have the check for zero stride then we wouldn't be excluding zero-size arrays from being considered contiguous.

However, this does lead to one failure in test_devicearray_broadcast_host_copy() because the array elements seem to get transposed during the copy over to the device.

I don't want to blow up the scope of this PR and stall a perfectly good fix to jump down a rabbit hole though, so I'd like to instead merge this and potentially follow up on other items later.

return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}

# If this is a broadcast array then it is not contiguous
if any([dim.stride == 0 for dim in self.dims]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted that the NumPy implementation that this is following (from the _UpdateContiguousFlags implementation referenced above) doesn't have this check for zero strides.

It's not directly related to this PR, but I think it is suspicious that we still differ in our implementation.

@gmarkall gmarkall merged commit 835845c into NVIDIA:main Nov 27, 2025
71 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Nov 27, 2025
Aims to align more closely with NumPy contiguity logic.

The example in the commit message from NVIDIA#612 still runs correctly with
this change.

I think this needs a little more consideration for now.
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Nov 27, 2025
- Revert NVIDIA#536 "perf: remove context threading in various pointer abstractions" (NVIDIA#611)
- fix: empty array type mismatch between host and device (NVIDIA#612)
- fix: warp vote operations must use a constant int for the `mode` parameter (NVIDIA#606)
@gmarkall gmarkall mentioned this pull request Nov 27, 2025
gmarkall added a commit that referenced this pull request Nov 27, 2025
- Revert #536 "perf: remove context threading in various pointer
abstractions" (#611)
- fix: empty array type mismatch between host and device (#612)
- fix: warp vote operations must use a constant int for the `mode`
parameter (#606)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass NULL pointer for cffi.FFI().from_buffer(empty array)

2 participants