perf: remove some exception control flow and buffer-exception penalization for arrays #700

cpcloud · 2026-01-07T15:36:16Z

Remove some smaller overheads from kernel launch. These are pretty modest gains of between 5-8%, but they are reproducible.

…tocol on every type inference of a cupy array

copy-pr-bot · 2026-01-07T15:36:20Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2026-01-07T15:38:41Z

Greptile Summary

This PR optimizes kernel launch overhead through cleaner code patterns that avoid exception handling:

numpy_support.py: Replaces try/except KeyError with dict.get() and try/except AttributeError with getattr() using the walrus operator. These changes eliminate exception overhead while maintaining identical functionality.
typeof.py: Reorders type checking to prioritize __cuda_array_interface__ before the buffer protocol check. This avoids expensive memoryview creation and exception handling for CuPy arrays and other CUDA-aware arrays.

Both changes follow the principle of avoiding exceptions for control flow, which is a Python performance best practice. The logic remains functionally equivalent with no behavioral changes.

Confidence Score: 5/5

This PR is safe to merge with no identified risks
The changes are pure performance optimizations that maintain identical behavior. Exception handling is replaced with standard Python idioms (.get(), getattr()), and type check reordering preserves logical correctness while improving efficiency for common cases (CuPy arrays). No functional changes or edge case issues identified.
No files require special attention

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/np/numpy_support.py	Replaced exception-based control flow with dict.get() and getattr() for cleaner, faster lookups. Changes are safe and idiomatic.
numba_cuda/numba/cuda/typing/typeof.py	Moved buffer protocol check after `__cuda_array_interface__` check to avoid expensive memoryview creation for CuPy arrays. Maintains correctness while improving performance.

kkraus14 · 2026-01-07T16:02:02Z

/ok to test

- Add arch specific target support (NVIDIA#549) - chore: disable `locked` flag to bypass prefix-dev/pixi#5256 (NVIDIA#714) - ci: relock pixi (NVIDIA#712) - ci: remove redundant conda build in ci (NVIDIA#711) - chore(deps): bump numba-cuda version and relock pixi (NVIDIA#707) - Dropping bits in the old CI & Propagating recent changes from cuda-python (NVIDIA#683) - Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and `nvrtc` packages for CUDA 13 (NVIDIA#701) - perf: remove some exception control flow and buffer-exception penalization for arrays (NVIDIA#700) - perf: let CAI fall through instead of calling from_cuda_array_interface (NVIDIA#694) - chore: perf lint (NVIDIA#697) - chore(deps): bump deps in pixi lockfile (NVIDIA#693) - fix: use freethreading-supported `_PySet_NextItemRef` where possible (NVIDIA#682) - Support python `3.14` (NVIDIA#599) - Remove customized address space tracking and address class emission in debug info (NVIDIA#669) - Drop `experimental` from cuda.core namespace imports (NVIDIA#676) - Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (NVIDIA#675) - Use `rapidsai/sccache` in CI (NVIDIA#674) - chore(dev-deps): remove ipython and pyinstrument (NVIDIA#670) - Set up a new VM-based CI infrastructure (NVIDIA#604)

- Add arch specific target support (#549) - chore: disable `locked` flag to bypass prefix-dev/pixi#5256 (#714) - ci: relock pixi (#712) - ci: remove redundant conda build in ci (#711) - chore(deps): bump numba-cuda version and relock pixi (#707) - Dropping bits in the old CI & Propagating recent changes from cuda-python (#683) - Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and `nvrtc` packages for CUDA 13 (#701) - perf: remove some exception control flow and buffer-exception penalization for arrays (#700) - perf: let CAI fall through instead of calling from_cuda_array_interface (#694) - chore: perf lint (#697) - chore(deps): bump deps in pixi lockfile (#693) - fix: use freethreading-supported `_PySet_NextItemRef` where possible (#682) - Support python `3.14` (#599) - Remove customized address space tracking and address class emission in debug info (#669) - Drop `experimental` from cuda.core namespace imports (#676) - Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (#675) - Use `rapidsai/sccache` in CI (#674) - chore(dev-deps): remove ipython and pyinstrument (#670) - Set up a new VM-based CI infrastructure (#604)

cpcloud added 2 commits January 7, 2026 10:07

perf: avoid incurring the cost of failing to implement the buffer pro…

bed4d60

…tocol on every type inference of a cupy array

perf: avoid using exceptions for control flow

32a6857

cpcloud changed the title ~~small perf improvements~~ perf: remove some exception control flow and buffer-exception penalization for arrays Jan 7, 2026

cpcloud requested a review from gmarkall January 7, 2026 15:37

kkraus14 approved these changes Jan 7, 2026

View reviewed changes

kkraus14 enabled auto-merge (squash) January 7, 2026 16:02

kkraus14 merged commit 459b8c0 into NVIDIA:main Jan 7, 2026
118 of 119 checks passed

cpcloud deleted the small-perf-improvements branch January 7, 2026 16:39

gmarkall mentioned this pull request Jan 12, 2026

Bump version to 0.24.0 #716

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: remove some exception control flow and buffer-exception penalization for arrays #700

perf: remove some exception control flow and buffer-exception penalization for arrays #700

Uh oh!

cpcloud commented Jan 7, 2026

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

greptile-apps bot commented Jan 7, 2026

Uh oh!

kkraus14 commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: remove some exception control flow and buffer-exception penalization for arrays #700

perf: remove some exception control flow and buffer-exception penalization for arrays #700

Uh oh!

Conversation

cpcloud commented Jan 7, 2026

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

greptile-apps bot commented Jan 7, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

kkraus14 commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants