Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 17 additions & 14 deletions numba_cuda/numba/cuda/cudadrv/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -2312,18 +2312,9 @@ def __init__(
self.lineinfo = lineinfo
self.cc = cc
self.arch = arch
if lto is False:
# WAR for apparent nvjitlink issue
lto = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this will be a problem with the 12.9 CUDA Python bindings - there was a version that would only accept None and not False, but I don't recall what the version was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell from the CI so far, the removal of the None value did not cause an issue - perhaps it is no longer needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests that are now failing, it seems that this workaround is probably still needed.

self.lto = lto
self.additional_flags = additional_flags

self.options = LinkerOptions(
max_register_count=self.max_registers,
lineinfo=lineinfo,
arch=arch,
link_time_optimization=lto,
)
self._has_ltoir = False
self._complete = False
self._object_codes = []
self.linker = None # need at least one program
Expand Down Expand Up @@ -2446,6 +2437,8 @@ def add_cu(self, cu, name="<cudapy-cu>"):
print(obj.code)

self._object_codes.append(obj)
if self.lto:
self._has_ltoir = True

def add_cubin(self, cubin, name="<cudapy-cubin>"):
obj = ObjectCode.from_cubin(cubin, name=name)
Expand All @@ -2454,6 +2447,7 @@ def add_cubin(self, cubin, name="<cudapy-cubin>"):
def add_ltoir(self, ltoir, name="<cudapy-ltoir>"):
obj = ObjectCode.from_ltoir(ltoir, name=name)
self._object_codes.append(obj)
self._has_ltoir = True

def add_fatbin(self, fatbin, name="<cudapy-fatbin>"):
obj = ObjectCode.from_fatbin(fatbin, name=name)
Expand Down Expand Up @@ -2499,15 +2493,23 @@ def add_data(self, data, kind, name):

fn(data, name)

def get_linked_ptx(self):
def _get_linker_options(self, ptx):
# Some linker flags are only valid/required if LTOIR object code is present.
# WAR for cuda-core < 0.4.0 where passing False incorrectly appends flags
# (fixed in cuda-python PR #989, released in cuda-core v0.4.0)
lto_flag = True if self._has_ltoir else None
ptx_flag = True if (self._has_ltoir and ptx) else None
options = LinkerOptions(
max_register_count=self.max_registers,
lineinfo=self.lineinfo,
arch=self.arch,
link_time_optimization=True,
ptx=True,
link_time_optimization=lto_flag,
ptx=ptx_flag,
)
return options

def get_linked_ptx(self):
options = self._get_linker_options(ptx=True)
self.linker = Linker(*self._object_codes, options=options)

result = self.linker.link("ptx")
Expand All @@ -2526,7 +2528,8 @@ def complete(self):
cubin is a pointer to a internal buffer of cubin owned by the linker;
thus, it should be loaded before the linker is destroyed.
"""
self.linker = Linker(*self._object_codes, options=self.options)
options = self._get_linker_options(ptx=False)
self.linker = Linker(*self._object_codes, options=options)
result = self.linker.link("cubin")
self.close()
self._complete = True
Expand Down
23 changes: 23 additions & 0 deletions numba_cuda/numba/cuda/tests/cudadrv/test_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-License-Identifier: BSD-2-Clause

import numpy as np
import os
import pytest
from numba.cuda.testing import unittest
from numba.cuda.testing import (
Expand Down Expand Up @@ -109,6 +110,25 @@ def simple_lmem(A, B, dty):
B[i] = C[i]


TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)


add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
Comment on lines +120 to +124
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is defined outside the if TEST_BIN_DIR: block (lines 114-117) but references test_device_functions_ltoir which is only defined inside that block. When TEST_BIN_DIR is not set, importing this test file will raise NameError: name 'test_device_functions_ltoir' is not defined.

This code should be indented to be inside the if TEST_BIN_DIR: block, or the declarations should be conditional:

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)

Comment on lines +120 to +124
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will cause a NameError when TEST_BIN_DIR is not set, because test_device_functions_ltoir is only defined inside the if TEST_BIN_DIR: block (lines 114-117). When the module loads, the cuda.declare_device() call will fail trying to access an undefined variable.

This will prevent the entire test module from importing when NUMBA_CUDA_TEST_BIN_DIR is not set in the environment.

The module-level declarations of add_from_numba and debuggable_kernel should be moved inside the if TEST_BIN_DIR: block, or the test method test_debug_kernel_with_lto should include a skip decorator similar to other tests in test_nvjitlink.py:

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)

Comment on lines +113 to +124
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable test_device_functions_ltoir is only defined when TEST_BIN_DIR is set (line 114-117), but it's used unconditionally on line 123. This will cause a NameError when the test module is imported if TEST_BIN_DIR is not set in the environment, preventing the entire test module from loading.

This should either:

  1. Move the add_from_numba and debuggable_kernel definitions inside the conditional block, or
  2. Initialize test_device_functions_ltoir to a default value (like None) and add appropriate guards
Suggested change
TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)



def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)
Comment on lines +120 to +129
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set, since test_device_functions_ltoir is only defined inside the conditional block. Move these declarations inside the test method or make them conditional.

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)
# Move these inside test_debug_kernel_with_lto method
# or wrap in: if TEST_BIN_DIR:

Comment on lines +120 to +129
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set. test_device_functions_ltoir is only defined inside the conditional block (lines 114-117), but is referenced unconditionally here.

Move these declarations inside the if TEST_BIN_DIR: block:

Suggested change
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)
if TEST_BIN_DIR:
test_device_functions_ltoir = os.path.join(
TEST_BIN_DIR, "test_device_functions.ltoir"
)
add_from_numba = cuda.declare_device(
"add_from_numba",
"int32(int32, int32)",
link=[test_device_functions_ltoir],
)
def debuggable_kernel(result):
i = cuda.grid(1)
result[i] = add_from_numba(i, i)



@skip_on_cudasim("Linking unsupported in the simulator")
class TestLinker(CUDATestCase):
@require_context
Expand Down Expand Up @@ -330,6 +350,9 @@ def test_get_local_mem_per_specialized(self):
calc_size = np.dtype(np.float64).itemsize * LMEM_SIZE
self.assertGreaterEqual(local_mem_size, calc_size)

def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses debuggable_kernel which references add_from_numba, but that function is only defined when TEST_BIN_DIR is set. The test will fail with NameError when TEST_BIN_DIR is not set.

The test should either:

  1. Be skipped when TEST_BIN_DIR is not set, or
  2. Be moved inside a conditional block
Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not os.getenv("NUMBA_CUDA_TEST_BIN_DIR"), "TEST_BIN_DIR not set")
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_debug_kernel_with_lto references debuggable_kernel and add_from_numba (lines 127-129 and 120-124), which are only defined when TEST_BIN_DIR is set (line 114). When TEST_BIN_DIR is not set, this test will fail with a NameError.

This test needs a skip decorator like other tests in the codebase that depend on TEST_BIN_DIR:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")
def test_debug_kernel_with_lto(self):

See test_nrt.py line 182 and test_linkable_code.py line 40 for examples of the correct pattern.

Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_debug_kernel_with_lto will fail when TEST_BIN_DIR environment variable is not set because debuggable_kernel and add_from_numba are only defined when TEST_BIN_DIR is set (as suggested in the comment on lines 120-124).

This test should include a skip decorator similar to tests in test_nvjitlink.py:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(
not TEST_BIN_DIR, "TEST_BIN_DIR not set"
)
def test_debug_kernel_with_lto(self):

Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses debuggable_kernel which references test_device_functions_ltoir that only exists when TEST_BIN_DIR is set. The test should have a skip decorator to avoid failures when the required LTOIR file is not available.

Consider adding a skip decorator similar to other tests in this file:

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipUnless(TEST_BIN_DIR, "TEST_BIN_DIR not set")
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing skip decorator for when TEST_BIN_DIR is not set. Test will fail with NameError accessing debuggable_kernel.

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")
def test_debug_kernel_with_lto(self):

Comment on lines +353 to +354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing skip decorator. Test will fail with NameError when TEST_BIN_DIR is not set.

Suggested change
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)
@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")
def test_debug_kernel_with_lto(self):
cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


@skip_if_nvjitlink_missing("nvJitLink not installed or new enough (>12.3)")
def test_link_for_different_cc(self):
linker = _Linker(max_registers=0, cc=(7, 5), lto=True)
Expand Down
Loading