Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
6b66886
begin replacing pynvjitlinker
brandon-b-miller Feb 21, 2025
2c5e69e
Merge branch 'main' into cuda-core-linker
brandon-b-miller Feb 24, 2025
b44f6bf
begin implementing cuda-python linker
brandon-b-miller Feb 24, 2025
5f3eff0
fix misconfigured precommit
brandon-b-miller Feb 24, 2025
fa80a4f
properly handle module image
brandon-b-miller Mar 4, 2025
049ff57
almost pass linker tests
brandon-b-miller Mar 4, 2025
9aeff49
trying to pass more tests
brandon-b-miller Mar 4, 2025
b551d6f
clean
brandon-b-miller Mar 4, 2025
ab9ac6f
pass includes correctly
brandon-b-miller Mar 10, 2025
172e340
pass a few more linker tests
brandon-b-miller Mar 11, 2025
4334b8b
Context.create_module_ptx wraps PTX in an ObjectCode instance
brandon-b-miller Mar 11, 2025
0c8171a
workaround passing lto=False to cuda-python
brandon-b-miller Mar 12, 2025
cc70f1d
drop to ctypes ptr in nrt when using nv binding
brandon-b-miller Mar 14, 2025
1ed3520
pass more tests
brandon-b-miller Mar 15, 2025
badafc6
pass more tests
brandon-b-miller Mar 17, 2025
bdd1f19
partial fixing of more tests
brandon-b-miller Mar 18, 2025
d6e682d
clean
brandon-b-miller Mar 19, 2025
2ac2796
merge/resolve/incorporate memsys fixes from main
brandon-b-miller Mar 19, 2025
e444fb1
fix test_linking_cu_log_warning
brandon-b-miller Mar 19, 2025
dcf4b92
merge/resolve
brandon-b-miller Apr 29, 2025
a2fa420
fix lto dump assembly
brandon-b-miller May 5, 2025
0c84361
fix linker args
brandon-b-miller May 6, 2025
41b2565
Merge branch 'main' into cuda-core-linker
brandon-b-miller May 22, 2025
23f32a4
update for anticipated cuda-python changes
brandon-b-miller May 22, 2025
a5fbf6a
pass more tests
brandon-b-miller May 28, 2025
9e1ecc5
deps
brandon-b-miller May 28, 2025
bd0997d
cleanup
brandon-b-miller May 28, 2025
43bdbbf
small error
brandon-b-miller May 28, 2025
35bcf8c
add_data, extra includes, changes from main
brandon-b-miller May 28, 2025
aa9f6e6
remove deprecated import paths that are producing warnings
brandon-b-miller May 28, 2025
3ad496e
dep updates
brandon-b-miller May 28, 2025
b5b2b03
merge/resolve
brandon-b-miller Jun 2, 2025
c32a850
remove old linkers
brandon-b-miller Jun 3, 2025
568c2fc
update for new nrt location
brandon-b-miller Jun 3, 2025
890dfb4
repurpose pynvjitlink ci scripts
brandon-b-miller Jun 3, 2025
a98ded0
missed env var
brandon-b-miller Jun 3, 2025
4fad9f3
add name
brandon-b-miller Jun 6, 2025
9f0fd74
renaming
brandon-b-miller Jun 6, 2025
1585116
pass name through to all objectcode constructors
brandon-b-miller Jun 6, 2025
72d34d7
prune nvjitlink tests
brandon-b-miller Jun 6, 2025
c284f61
lto by default when using _Linker
brandon-b-miller Jun 6, 2025
b2eb4d0
merge/resolve
brandon-b-miller Jun 9, 2025
7bcbeb6
.code
brandon-b-miller Jun 9, 2025
7383789
Merge branch 'main' into cuda-core-linker
brandon-b-miller Jun 10, 2025
740d772
merge/resolve/update
brandon-b-miller Jun 13, 2025
483b799
add explicit nvjitlink dep
brandon-b-miller Jun 13, 2025
88cb04c
fix f string
brandon-b-miller Jun 13, 2025
155ae81
cuda-cuobjdump
brandon-b-miller Jun 13, 2025
8e5c422
cuda-core to wheel test deps
brandon-b-miller Jun 13, 2025
b36f129
fix 11.8 bug
brandon-b-miller Jun 13, 2025
a0eabdd
fix simulator
brandon-b-miller Jun 13, 2025
1437329
WAR for determining nvjitlink presence
brandon-b-miller Jun 14, 2025
a16947d
use/fix
brandon-b-miller Jun 14, 2025
05afacd
std=c++17 for nvrtc < 12.0
brandon-b-miller Jun 14, 2025
7e9ed30
do not pass 0 max registers to linker
brandon-b-miller Jun 14, 2025
cbabfe4
quick patch to solve bf16 include issue, needs refactor
brandon-b-miller Jun 15, 2025
7ceea50
refactor, pass tests locally
brandon-b-miller Jun 16, 2025
b2ee472
skip nvjitlink tests if unavailable or too old
brandon-b-miller Jun 16, 2025
0bfd849
update CI scripts
brandon-b-miller Jun 16, 2025
29bc722
manage use of NUMBA_CUDA_ENABLE_PYNVJITLINK
brandon-b-miller Jun 16, 2025
56b0f61
hopefully fix simulator
brandon-b-miller Jun 16, 2025
43d7490
update linker selection logic
brandon-b-miller Jun 16, 2025
4f20f90
ctypes linker inherits from _LinkerBase now
brandon-b-miller Jun 16, 2025
3f371e9
missed a use of ENABLE_PYNVJITLINK
brandon-b-miller Jun 16, 2025
ecc1ea9
cudadrv
brandon-b-miller Jun 16, 2025
ab07f3d
move cuda core imports under USE_NV_BINDING
brandon-b-miller Jun 16, 2025
0005ef2
more _LinkerBase
brandon-b-miller Jun 16, 2025
8f022f5
address remaining issues
brandon-b-miller Jun 17, 2025
b975df7
small bugs
brandon-b-miller Jun 17, 2025
52eb90a
Update numba_cuda/numba/cuda/tests/cudadrv/test_linker.py
brandon-b-miller Jun 18, 2025
e5df6d8
meta does not need nvjitlink
brandon-b-miller Jun 18, 2025
605d482
Update numba_cuda/numba/cuda/__init__.py
brandon-b-miller Jun 18, 2025
7f97b27
merge/resolve
brandon-b-miller Jun 18, 2025
caaeedc
Merge branch 'main' into cuda-core-linker
brandon-b-miller Jun 24, 2025
67ec8ee
Apply suggestions from code review
brandon-b-miller Jun 24, 2025
3c0623b
no cuda-core for ctypes binding wheel script
brandon-b-miller Jun 24, 2025
b9b1f27
make tests under nv binding 0
brandon-b-miller Jun 24, 2025
345f70c
use NUMBA_CUDA_USE_NVIDIA_BINDING for sysinfo test as well
brandon-b-miller Jun 25, 2025
500079d
Apply suggestions from code review
brandon-b-miller Jun 25, 2025
06ddad2
guard error import
brandon-b-miller Jun 25, 2025
1d94745
merge/resolve
brandon-b-miller Jun 26, 2025
3e12fc3
use have_nvjitlink in deco
brandon-b-miller Jun 26, 2025
ce00767
update test
brandon-b-miller Jun 26, 2025
d726a7a
address reviews
brandon-b-miller Jun 26, 2025
85f8710
try not forwarding lto=False to None
brandon-b-miller Jun 26, 2025
edf0f0a
address remaining reviews
brandon-b-miller Jun 26, 2025
547dab5
defer cuinit
brandon-b-miller Jun 26, 2025
3bd469d
update _have_nvjitlink
brandon-b-miller Jun 26, 2025
ba5c20a
updates
brandon-b-miller Jun 26, 2025
134f6ee
catch no driver at all
brandon-b-miller Jun 26, 2025
99c87f3
track pynvjitlink env discovery
brandon-b-miller Jun 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ jobs:
- compute-matrix
- build-conda
- test-conda
- test-conda-pynvjitlink
- test-conda-ctypes-binding
- test-simulator
- build-wheels
- test-wheels
- test-wheels-pynvjitlink
- test-wheels-ctypes-binding
- test-wheels-deps-wheels
- build-docs
secrets: inherit
Expand Down Expand Up @@ -76,14 +76,14 @@ jobs:
script: "ci/test_conda.sh"
run_codecov: false
matrix: ${{ needs.compute-matrix.outputs.TEST_MATRIX }}
test-conda-pynvjitlink:
test-conda-ctypes-binding:
needs:
- build-conda
- compute-matrix
uses: ./.github/workflows/conda-python-tests.yaml
with:
build_type: pull-request
script: "ci/test_conda_pynvjitlink.sh"
script: "ci/test_conda_ctypes_binding.sh"
run_codecov: false
# This selects "ARCH=amd64 and CUDA >=12, with the latest supported Python for each CUDA major version".
matrix: ${{ needs.compute-matrix.outputs.TEST_MATRIX }}
Expand Down Expand Up @@ -114,14 +114,14 @@ jobs:
build_type: pull-request
script: "ci/test_wheel.sh false"
matrix: ${{ needs.compute-matrix.outputs.TEST_MATRIX }}
test-wheels-pynvjitlink:
test-wheels-ctypes-binding:
needs:
- build-wheels
- compute-matrix
uses: ./.github/workflows/wheels-test.yaml
with:
build_type: pull-request
script: "ci/test_wheel_pynvjitlink.sh"
script: "ci/test_wheel_ctypes_binding.sh"
# This selects "ARCH=amd64 and CUDA >=12, with the latest supported Python for each CUDA major version".
matrix: ${{ needs.compute-matrix.outputs.TEST_MATRIX }}
matrix_filter: map(select(.ARCH == "amd64" and (.CUDA_VER | split(".") | .[0] | tonumber >= 12))) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
Expand Down
2 changes: 1 addition & 1 deletion ci/test_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ set -euo pipefail
if [ "${CUDA_VER%.*.*}" = "11" ]; then
CTK_PACKAGES="cudatoolkit=11"
else
CTK_PACKAGES="cuda-cccl cuda-nvcc-impl cuda-nvrtc libcurand-dev"
CTK_PACKAGES="cuda-cccl cuda-nvcc-impl cuda-nvrtc libcurand-dev cuda-cuobjdump"
apt-get update
apt remove --purge `dpkg --get-selections | grep cuda-nvvm | awk '{print $1}'` -y
apt remove --purge `dpkg --get-selections | grep cuda-nvrtc | awk '{print $1}'` -y
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,6 @@ EXITCODE=0
trap "EXITCODE=1" ERR
set +e


rapids-logger "Install pynvjitlink"
set +u
rapids-mamba-retry install -c rapidsai pynvjitlink
set -u

rapids-logger "Build tests"

PY_SCRIPT="
Expand All @@ -70,7 +64,7 @@ popd


rapids-logger "Run Tests"
NUMBA_CUDA_ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v
NUMBA_CUDA_USE_NVIDIA_BINDING=0 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v

popd

Expand Down
3 changes: 2 additions & 1 deletion ci/test_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ package=$(realpath wheel/numba_cuda*.whl)
echo "Package path: ${package}"
python -m pip install \
"${package}[test]" \
"cuda-python==${CUDA_VER_MAJOR_MINOR%.*}.*"
"cuda-python==${CUDA_VER_MAJOR_MINOR%.*}.*" \
"cuda-core==0.3.*"

GET_TEST_BINARY_DIR="
import numba_cuda
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ echo "Package path: $package"
python -m pip install \
"${package}[test]" \
cuda-python \
"pynvjitlink-cu${CUDA_VER_MAJOR}"

rapids-logger "Build tests"
PY_SCRIPT="
Expand All @@ -23,7 +22,7 @@ print(test_dir)

NUMBA_CUDA_TEST_BIN_DIR=$(python -c "$PY_SCRIPT")
pushd $NUMBA_CUDA_TEST_BIN_DIR
make
NUMBA_CUDA_USE_NVIDIA_BINDING=0 make
popd


Expand All @@ -35,9 +34,9 @@ mkdir -p "${RAPIDS_TESTS_DIR}"
pushd "${RAPIDS_TESTS_DIR}"

rapids-logger "Show Numba system info"
python -m numba --sysinfo
NUMBA_CUDA_USE_NVIDIA_BINDING=0 python -m numba --sysinfo

rapids-logger "Run Tests"
NUMBA_CUDA_ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v
NUMBA_CUDA_USE_NVIDIA_BINDING=0 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v

popd
1 change: 1 addition & 0 deletions conda/recipes/numba-cuda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ requirements:
- python
- numba >=0.59.1
- cuda-bindings
- cuda-core ==0.3.*

about:
home: {{ project_urls["Homepage"] }}
Expand Down
23 changes: 22 additions & 1 deletion numba_cuda/numba/cuda/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@
from numba import runtests
from numba.core import config
from .utils import _readenv
import warnings


# Enable pynvjitlink based on the following precedence:
# 1. Config setting "CUDA_ENABLE_PYNVJITLINK" (highest priority)
# 2. Environment variable "NUMBA_CUDA_ENABLE_PYNVJITLINK"
# 3. Auto-detection of pynvjitlink module (lowest priority)

pynvjitlink_auto_enabled = False

if getattr(config, "CUDA_ENABLE_PYNVJITLINK", None) is None:
if (
_pynvjitlink_enabled_in_env := _readenv(
Expand All @@ -15,9 +20,10 @@
) is not None:
config.CUDA_ENABLE_PYNVJITLINK = _pynvjitlink_enabled_in_env
else:
config.CUDA_ENABLE_PYNVJITLINK = (
pynvjitlink_auto_enabled = (
importlib.util.find_spec("pynvjitlink") is not None
)
config.CUDA_ENABLE_PYNVJITLINK = pynvjitlink_auto_enabled

# Upstream numba sets CUDA_USE_NVIDIA_BINDING to 0 by default, so it always
# exists. Override, but not if explicitly set to 0 in the envioronment.
Expand All @@ -44,6 +50,21 @@
"bindings."
)

if config.CUDA_ENABLE_PYNVJITLINK:
if USE_NV_BINDING:
warnings.warn(
"Explicitly enabling pynvjitlink is no longer necessary. "
"NVIDIA bindings are enabled. cuda.core will be used "
"in place of pynvjitlink."
)
elif pynvjitlink_auto_enabled:
# Ignore the fact that pynvjitlink is enabled, because that was an
# automatic decision based on discovering pynvjitlink was present; the
# user didn't ask for it
pass
else:
raise RuntimeError("nvJitLink requires the NVIDIA CUDA bindings. ")
Copy link
Contributor

@gmarkall gmarkall Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because config.CUDA_ENABLE_PYNVJITLINK is enabled automatically if it's found in the environment, disabling the NVIDIA bindings if pynvjitlink is installed now leads to this exception being hit.


if config.ENABLE_CUDASIM:
from .simulator_init import *
else:
Expand Down
14 changes: 8 additions & 6 deletions numba_cuda/numba/cuda/codegen.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@ def run_nvdisasm(cubin, flags):
try:
fd, fname = tempfile.mkstemp()
with open(fname, "wb") as f:
f.write(cubin)
if config.CUDA_USE_NVIDIA_BINDING:
f.write(cubin.code)
else:
f.write(cubin)

try:
cp = subprocess.run(
Expand Down Expand Up @@ -271,7 +274,7 @@ def get_cubin(self, cc=None):
return cubin

if self._lto and config.DUMP_ASSEMBLY:
linker = driver.Linker.new(
linker = driver._Linker.new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note / nit: this feels a little funny because the new method is a class method of _LinkerBase rather than _Linker.

max_registers=self._max_registers,
cc=cc,
additional_flags=["-ptx"],
Expand All @@ -280,14 +283,14 @@ def get_cubin(self, cc=None):
# `-ptx` flag is meant to view the optimized PTX for LTO objects.
# Non-LTO objects are not passed to linker.
self._link_all(linker, cc, ignore_nonlto=True)

ptx = linker.get_linked_ptx().decode("utf-8")
ptx = linker.get_linked_ptx()
ptx = ptx.decode("utf-8")

print(("ASSEMBLY (AFTER LTO) %s" % self._name).center(80, "-"))
print(ptx)
print("=" * 80)

linker = driver.Linker.new(
linker = driver._Linker.new(
max_registers=self._max_registers, cc=cc, lto=self._lto
)
self._link_all(linker, cc, ignore_nonlto=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on line 294 / 300 just below as it's not part of the diff, but I suspect if it was changed so that cubin = linker.complete().code if the NVIDIA binding is in use, then the other changes to use <variable>.code (https://github.com/NVIDIA/numba-cuda/pull/133/files#diff-bf7c3a638421914e7463c6a23bd6293b740ed0c3e22267a7469091f70a9ad746R26, https://github.com/NVIDIA/numba-cuda/pull/133/files#diff-626c1086aae9de03cd89533ef1e1b9f7b1279e834b54aab10ab3c271739def53R1690) wouldn't be needed and the interface would be more uniform across the linkers. (Not a necessary change as a lot of this stuff will churn anyway as we move more to cuda-core / cuda-bindings, but noticed the use of .code in other places).

Expand All @@ -312,7 +315,6 @@ def get_cufunc(self):
cufunc = self._cufunc_cache.get(device.id, None)
if cufunc:
return cufunc

cubin = self.get_cubin(cc=device.compute_capability)
module = ctx.create_module_image(
cubin, self._setup_functions, self._teardown_functions
Expand Down
Loading