[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565

atmnp · 2025-10-30T19:10:33Z

This change vendors in the global_compiler_lock that is used to protect the core compiler machinery for future CUDA-specific changes.

copy-pr-bot · 2025-10-30T19:10:36Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

atmnp · 2025-10-30T19:10:41Z

/ok to test

gmarkall

We also need to hold the Numba compiler lock, because we'll likely have a route into Numba somewhere through types and / or the data model that doesn't hold or acquire the Numba compiler lock.

…umba

atmnp · 2025-10-31T21:52:38Z

/ok to test

gmarkall · 2025-11-03T10:17:44Z

numba_cuda/numba/cuda/core/compiler_lock.py

+        self._lock = threading.RLock()
+
+    def acquire(self):
+        ev.start_event("numba.cuda:compiler_lock")


We're using numba-cuda: as the prefix for events from Numba-CUDA. See

numba-cuda/numba_cuda/numba/cuda/core/event.py

Lines 58 to 64 in 5aa6b2e

_builtin_kinds = frozenset(

[

"numba-cuda:compiler_lock",

"numba-cuda:compile",

"numba-cuda:run_pass",

]

)

gmarkall · 2025-11-03T10:23:33Z

numba_cuda/numba/cuda/core/compiler_lock.py

+        self._numba_lock = numba_lock
+
+    def acquire(self):
+        if self._numba_lock:


In what circumstance would this ever be False?

gmarkall · 2025-11-03T11:23:42Z

numba_cuda/numba/cuda/core/compiler_lock.py

+
+
+def require_global_compiler_lock():
+    """Sentry that checks the global_compiler_lock is acquired."""
+    # Use assert to allow turning off this check
+    assert global_compiler_lock.is_locked()


We don't use this, and it feels a little bit of a dangerous API to me, in that the name sounds like a function that will block until the global compiler lock is held (I'd probably have named it assert_global_compiler_lock() or similar). Can we remove it?

Suggested change

def require_global_compiler_lock():

"""Sentry that checks the global_compiler_lock is acquired."""

# Use assert to allow turning off this check

assert global_compiler_lock.is_locked()

atmnp · 2025-11-03T11:53:06Z

/ok to test

gmarkall

Thanks for the updates - I think we should simplify things with a few more changes:

Remove require_global_compiler_lock() as suggested on the diff.
This means we can also get rid of the is_locked() and _is_owned() methods on both kinds of lock, as they will no longer be needed.
The _DualCompilerLock seems to have no reason to check the truthiness of self._numba_lock - should it just assume it to be true and remove the conditionals?

gmarkall · 2025-11-03T12:50:05Z

/ok to test

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

atmnp self-assigned this Oct 30, 2025

atmnp added the 3 - Ready for Review Ready for review by team label Oct 30, 2025

gmarkall requested changes Oct 31, 2025

View reviewed changes

atmnp added 2 commits October 31, 2025 14:46

[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes

dc93dac

set the numba cuda global compiler lock to also control the lock of n…

b8e5b64

…umba

atmnp force-pushed the atmn/vendor-in-compiler-lock branch from a8f63df to b8e5b64 Compare October 31, 2025 21:52

gmarkall reviewed Nov 3, 2025

View reviewed changes

remove deadcode, change event logging prefix

46d66bb

gmarkall requested changes Nov 3, 2025

View reviewed changes

remove is_locked, is_owned

83a8a73

gmarkall approved these changes Nov 3, 2025

View reviewed changes

gmarkall added 4 - Waiting on CI Waiting for a CI run to finish successfully 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Waiting on CI Waiting for a CI run to finish successfully labels Nov 3, 2025

gmarkall merged commit b04dd44 into NVIDIA:main Nov 3, 2025
70 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565

[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565

Uh oh!

atmnp commented Oct 30, 2025

Uh oh!

copy-pr-bot bot commented Oct 30, 2025

Uh oh!

atmnp commented Oct 30, 2025

Uh oh!

gmarkall left a comment

Uh oh!

atmnp commented Oct 31, 2025

Uh oh!

gmarkall Nov 3, 2025

Uh oh!

gmarkall Nov 3, 2025

Uh oh!

gmarkall Nov 3, 2025

Uh oh!

atmnp commented Nov 3, 2025

Uh oh!

gmarkall left a comment

Uh oh!

gmarkall commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_builtin_kinds = frozenset(
	[
	"numba-cuda:compiler_lock",
	"numba-cuda:compile",
	"numba-cuda:run_pass",
	]
	)

[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565

[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565

Uh oh!

Conversation

atmnp commented Oct 30, 2025

Uh oh!

copy-pr-bot bot commented Oct 30, 2025

Uh oh!

atmnp commented Oct 30, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

atmnp commented Oct 31, 2025

Uh oh!

gmarkall Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

atmnp commented Nov 3, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants