refactor: decouple `Context` from `Stream` and `Event` objects #579

cpcloud · 2025-11-04T20:47:35Z

Remove some unnecessary weak reference usage by decoupling Context objects from Stream and Event.

copy-pr-bot · 2025-11-04T20:47:40Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2025-11-04T20:48:56Z

numba_cuda/numba/cuda/cudadrv/driver.py

-            return "<External CUDA stream %d on %s>" % (ptr, self.context)
+            return f"<External CUDA stream {ptr:d}>"
        else:
-            return "<CUDA stream %d on %s>" % (ptr, self.context)


The context was only being used for repring. The complexity of weak references doesn't seem justifiable for pretty printing, hence this PR :)

cpcloud · 2025-11-04T20:49:23Z

numba_cuda/numba/cuda/cudadrv/driver.py


-class Event(object):
-    def __init__(self, context, handle, finalizer=None):
-        self.context = context


This was just never used in the class.

cpcloud · 2025-11-04T20:58:21Z

/ok to test

cpcloud · 2025-11-04T20:59:41Z

/ok to test

cpcloud · 2025-11-04T21:16:41Z

/ok to test

cpcloud · 2025-11-04T21:31:17Z

/ok to test

cpcloud · 2025-11-04T21:38:30Z

/ok to test

cpcloud · 2025-11-04T21:43:23Z

/ok to test

cpcloud · 2025-11-04T21:53:37Z

ci/test_conda.sh

-  apt-get update
-  apt remove --purge `dpkg --get-selections | grep cuda-nvvm | awk '{print $1}'` -y
-  apt remove --purge `dpkg --get-selections | grep cuda-nvrtc | awk '{print $1}'` -y
-fi


This is not related to the other changes, but it was causing problems in CI, and isn't necessary anymore since we're getting all dependencies from pixi in this script, so I removed it.

I think the reason this remained is that the CUDA pathfinder can mistakenly find system libraries ahead of ones in the environment, so we remove them to make sure we're testing with the right libraries. This was discussed in NVIDIA/cuda-python#839 - note that although the issue is closed, I believe the fix only makes it less likely to find the library from the system install of the toolkit, not impossible.

The behavior now should be correct. pip wheels will be found first, then conda packages, then a default system search using dlopen which has its own hierarchy.

As far as I know the only issue that remains is that if you're trying to find system libraries (as opposed to pip wheels or conda packages) that nvvm doesn't end up in ldconfig / LD_LIBRARY_PATH or any of the standard search paths. We hope to address this in the near future.

cpcloud · 2025-11-04T23:27:55Z

Interesting, will have to explore this more tomorrow.

cpcloud · 2025-11-05T14:43:06Z

I'll forego the Context removal from Module for now, and put that up in a separate PR, so that the other removals can remain.

cpcloud · 2025-11-05T14:54:22Z

/ok to test

cpcloud · 2025-11-05T16:55:11Z

Sounds like we can continue with the review then? @gmarkall @kkraus14

kkraus14 · 2025-11-05T18:09:05Z

numba_cuda/numba/cuda/cudadrv/driver.py

+class Stream:
+    def __init__(self, handle, finalizer=None, external=False):


There is unfortunately public code using the Stream constructor here with the context argument. It looks like most of it is stale and pinned to old versions of Numba, but we should maybe be a bit mindful.

@gmarkall what are your thoughts?

We should probably lay out somewhere what constructors are public, since hand constructing these doesn't appear to be part of any explicit public API, just the one you get from the default Python conventions.

Can you highlight the code that's using the constructor explicitly? The way it's presented in the documentation is maybe not ideal, but users are directed to use one of:

cuda.stream()

cuda.default_stream()

cuda.legacy_default_stream()

cuda.per_thread_default_stream()

cuda.external_stream(ptr)

to construct a Stream object rather than using the constructor directly: https://nvidia.github.io/numba-cuda/reference/host.html#stream-management

kkraus14 · 2025-11-05T18:11:44Z

numba_cuda/numba/cuda/cudadrv/driver.py

-    def __init__(self, context, handle, finalizer=None):
-        self.context = context
+class Event:
+    def __init__(self, handle, finalizer=None):


+1 on this being something that has been used with the context argument @gmarkall

It is currently doing nothing in the class and to my eye wasting space + adding the complexity of a weakref. I'm not sure preserving backwards compatibility for something that is not clearly serving any purpose is worth the trouble.

I agree about it not serving a purpose currently, but there's something to be said about not breaking existing code using it...

No use of numba.cuda and Event in the same module across all of searchable rapidsai:

https://github.com/search?q=org%3Arapidsai+numba.cuda+language%3APython++Event&type=code

Same is true for Stream: https://github.com/search?q=org%3Arapidsai+numba.cuda+language%3APython++%22Stream%22&type=code

If there's some usage of it that we can point to that would be useful information.

Similarly to streams, users shoudn't be creating Event instances directly: https://nvidia.github.io/numba-cuda/reference/host.html#events

gmarkall · 2025-11-18T11:20:42Z

I'm happy for this to be merged, though I'm mindful that it sounds like there could be code using the Stream and Event constructors directly that I'm unaware of (and can't find with a few minutes of searching and regexes with Sourcegraph).

cpcloud · 2025-11-18T15:47:14Z

Let's merge, and we can revisit these constuctors if something comes up.

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

cpcloud requested a review from gmarkall November 4, 2025 20:47

cpcloud commented Nov 4, 2025

View reviewed changes

cpcloud force-pushed the decouple-context-from-cuda-object-wrappers branch from 9e11245 to c2cb7c3 Compare November 4, 2025 20:58

refactor: decouple Context from Stream and Event objects

2309b7d

cpcloud force-pushed the decouple-context-from-cuda-object-wrappers branch from c2cb7c3 to 2309b7d Compare November 4, 2025 20:59

cpcloud force-pushed the decouple-context-from-cuda-object-wrappers branch 2 times, most recently from e966bc6 to 3d31059 Compare November 4, 2025 21:30

cpcloud commented Nov 4, 2025

View reviewed changes

ci: remove distro update from test_conda.sh since we are using pixi

0f0616a

cpcloud force-pushed the decouple-context-from-cuda-object-wrappers branch from 3c21204 to 0f0616a Compare November 5, 2025 14:44

cpcloud requested a review from kkraus14 November 5, 2025 17:46

kkraus14 reviewed Nov 5, 2025

View reviewed changes

gmarkall approved these changes Nov 18, 2025

View reviewed changes

gmarkall added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Nov 18, 2025

cpcloud merged commit df8f261 into NVIDIA:main Nov 18, 2025
69 checks passed

cpcloud deleted the decouple-context-from-cuda-object-wrappers branch November 18, 2025 15:47

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

		class Stream:
		def __init__(self, handle, finalizer=None, external=False):

refactor: decouple Context from Stream and Event objects #579

refactor: decouple Context from Stream and Event objects #579

Uh oh!

Conversation

cpcloud commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Nov 4, 2025

Uh oh!

cpcloud commented Nov 5, 2025

Uh oh!

cpcloud commented Nov 5, 2025

Uh oh!

cpcloud commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Nov 18, 2025

Uh oh!

cpcloud commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: decouple `Context` from `Stream` and `Event` objects #579

refactor: decouple `Context` from `Stream` and `Event` objects #579

cpcloud commented Nov 4, 2025 •

edited

Loading