Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions ci/test_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@

set -euo pipefail

DISTRO=`cat /etc/os-release | grep "^ID=" | awk 'BEGIN {FS="="} { print $2 }'`

if [ "$DISTRO" = "ubuntu" ]; then
apt-get update
apt remove --purge `dpkg --get-selections | grep cuda-nvvm | awk '{print $1}'` -y
apt remove --purge `dpkg --get-selections | grep cuda-nvrtc | awk '{print $1}'` -y
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the other changes, but it was causing problems in CI, and isn't necessary anymore since we're getting all dependencies from pixi in this script, so I removed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this remained is that the CUDA pathfinder can mistakenly find system libraries ahead of ones in the environment, so we remove them to make sure we're testing with the right libraries. This was discussed in NVIDIA/cuda-python#839 - note that although the issue is closed, I believe the fix only makes it less likely to find the library from the system install of the toolkit, not impossible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior now should be correct. pip wheels will be found first, then conda packages, then a default system search using dlopen which has its own hierarchy.

As far as I know the only issue that remains is that if you're trying to find system libraries (as opposed to pip wheels or conda packages) that nvvm doesn't end up in ldconfig / LD_LIBRARY_PATH or any of the standard search paths. We hope to address this in the near future.


# Constrain oldest supported dependencies for testing
if [ "${NUMBA_VERSION:-*}" != "*" ]; then
# add to the default environment's dependencies
Expand Down
40 changes: 17 additions & 23 deletions numba_cuda/numba/cuda/cudadrv/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -1329,19 +1329,19 @@ def unload_module(self, module):

def get_default_stream(self):
handle = drvapi.cu_stream(int(binding.CUstream(CU_STREAM_DEFAULT)))
return Stream(weakref.proxy(self), handle, None)
return Stream(handle)

def get_legacy_default_stream(self):
handle = drvapi.cu_stream(
int(binding.CUstream(binding.CU_STREAM_LEGACY))
)
return Stream(weakref.proxy(self), handle, None)
return Stream(handle)

def get_per_thread_default_stream(self):
handle = drvapi.cu_stream(
int(binding.CUstream(binding.CU_STREAM_PER_THREAD))
)
return Stream(weakref.proxy(self), handle, None)
return Stream(handle)

def create_stream(self):
# The default stream creation flag, specifying that the created
Expand All @@ -1351,26 +1351,22 @@ def create_stream(self):
flags = binding.CUstream_flags.CU_STREAM_DEFAULT.value
handle = drvapi.cu_stream(int(driver.cuStreamCreate(flags)))
return Stream(
weakref.proxy(self),
handle,
_stream_finalizer(self.deallocations, handle),
handle, finalizer=_stream_finalizer(self.deallocations, handle)
)

def create_external_stream(self, ptr):
if not isinstance(ptr, int):
raise TypeError("ptr for external stream must be an int")
handle = drvapi.cu_stream(int(binding.CUstream(ptr)))
return Stream(weakref.proxy(self), handle, None, external=True)
return Stream(handle, external=True)

def create_event(self, timing=True):
flags = 0
if not timing:
flags |= enums.CU_EVENT_DISABLE_TIMING
handle = drvapi.cu_event(int(driver.cuEventCreate(flags)))
return Event(
weakref.proxy(self),
handle,
finalizer=_event_finalizer(self.deallocations, handle),
handle, finalizer=_event_finalizer(self.deallocations, handle)
)

def synchronize(self):
Expand All @@ -1383,7 +1379,7 @@ def defer_cleanup(self):
yield

def __repr__(self):
return "<CUDA context %s of device %d>" % (self.handle, self.device.id)
return f"<CUDA context {self.handle} of device {self.device.id:d}>"

def __eq__(self, other):
if isinstance(other, Context):
Expand Down Expand Up @@ -2058,9 +2054,8 @@ class ManagedOwnedPointer(OwnedPointer, mviewbuf.MemAlloc):
pass


class Stream(object):
def __init__(self, context, handle, finalizer, external=False):
self.context = context
class Stream:
def __init__(self, handle, finalizer=None, external=False):
Comment on lines +2057 to +2058
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is unfortunately public code using the Stream constructor here with the context argument. It looks like most of it is stale and pinned to old versions of Numba, but we should maybe be a bit mindful.

@gmarkall what are your thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably lay out somewhere what constructors are public, since hand constructing these doesn't appear to be part of any explicit public API, just the one you get from the default Python conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you highlight the code that's using the constructor explicitly? The way it's presented in the documentation is maybe not ideal, but users are directed to use one of:

  • cuda.stream()
  • cuda.default_stream()
  • cuda.legacy_default_stream()
  • cuda.per_thread_default_stream()
  • cuda.external_stream(ptr)

to construct a Stream object rather than using the constructor directly: https://nvidia.github.io/numba-cuda/reference/host.html#stream-management

self.handle = handle
self.external = external
if finalizer is not None:
Expand All @@ -2077,18 +2072,18 @@ def __cuda_stream__(self):

def __repr__(self):
default_streams = {
drvapi.CU_STREAM_DEFAULT: "<Default CUDA stream on %s>",
drvapi.CU_STREAM_LEGACY: "<Legacy default CUDA stream on %s>",
drvapi.CU_STREAM_PER_THREAD: "<Per-thread default CUDA stream on %s>",
drvapi.CU_STREAM_DEFAULT: "<Default CUDA stream>",
drvapi.CU_STREAM_LEGACY: "<Legacy default CUDA stream>",
drvapi.CU_STREAM_PER_THREAD: "<Per-thread default CUDA stream>",
}
ptr = self.handle.value or drvapi.CU_STREAM_DEFAULT

if ptr in default_streams:
return default_streams[ptr] % self.context
return default_streams[ptr]
elif self.external:
return "<External CUDA stream %d on %s>" % (ptr, self.context)
return f"<External CUDA stream {ptr:d}>"
else:
return "<CUDA stream %d on %s>" % (ptr, self.context)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context was only being used for repring. The complexity of weak references doesn't seem justifiable for pretty printing, hence this PR :)

return f"<CUDA stream {ptr:d}>"

def synchronize(self):
"""
Expand Down Expand Up @@ -2190,9 +2185,8 @@ def callback(stream, status, future):
return future


class Event(object):
def __init__(self, context, handle, finalizer=None):
self.context = context
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just never used in the class.

class Event:
def __init__(self, handle, finalizer=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this being something that has been used with the context argument @gmarkall

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently doing nothing in the class and to my eye wasting space + adding the complexity of a weakref. I'm not sure preserving backwards compatibility for something that is not clearly serving any purpose is worth the trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about it not serving a purpose currently, but there's something to be said about not breaking existing code using it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No use of numba.cuda and Event in the same module across all of searchable rapidsai:

https://github.com/search?q=org%3Arapidsai+numba.cuda+language%3APython++Event&type=code

Same is true for Stream: https://github.com/search?q=org%3Arapidsai+numba.cuda+language%3APython++%22Stream%22&type=code

If there's some usage of it that we can point to that would be useful information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to streams, users shoudn't be creating Event instances directly: https://nvidia.github.io/numba-cuda/reference/host.html#events

self.handle = handle
if finalizer is not None:
weakref.finalize(self, finalizer)
Expand Down
Loading