Support consuming NRT functions from external libraries by brandon-b-miller · Pull Request #167 · NVIDIA/numba-cuda

brandon-b-miller · 2025-03-19T02:08:14Z

This PR moves some NRT function declarations to a header that can be found at build time by external libraries. It also fixes a bug where NRT wasn't linked if its declarations were present in the passed user source, but not the half of the PTX numba generates itself.

gmarkall · 2025-03-19T10:23:50Z

numba_cuda/numba/cuda/dispatcher.py


    def maybe_link_nrt(self, link, tgt_ctx, asm):
-        if not tgt_ctx.enable_nrt:
+        if not tgt_ctx.enable_nrt or self.nrt is False:


Do you want the nrt kwarg for the JIT decorator to take precedence over the global NRT setting for enabling it? I think this will never link the NRT if it is globally disabled.

Good observation. I think the best behavior might be

kwarg takes precedence, allowing users definitive control over if the NRT code should be added to the link for a particular kernel

global flag enables the detection mechanism

But I'm not sure if numba's code generation will actually provide increfs/decrefs without enable_nrt set to true in the target context, is that correct? In that case we might be adding the refcounting functions to the link but not actually calling them.

The motivation for this kwarg is twofold: Where it came up was an edge case I observed in cuDF where an NRT managed object is created and immediately returned, such as in lambda string: return string. In this case it works out that NRT functions are used, but they only appear in the separately provided PTX file. The ASM that numba inspects when deciding to link NRT or not contains only the declarations for the parent functions that eventually call the NRT ones, so this defeats numba's NRT detection.

Overall I feel that unless the NRT detection is absolutely definitive somehow it might be more reliable to let the user decide when to link NRT. This has the drawback of creating two switches, one to enable the possiblity of NRT, and then a second one to actually add it to the link. But even that I think its a little better than somewhat shaky auto detection.

The ASM that numba inspects when deciding to link NRT or not contains only the declarations for the parent functions that eventually call the NRT ones, so this defeats numba's NRT detection.

Are you saying that numba-cuda doesn't link the NRT when it should? If so, then I would have seen that as a bug in numba-cuda?

Or, if it adds NRT to the asm when it's not necessary, we should be running the refcount pruning pass to elide those unnecessary NRT calls.

Is there a third situation I'm missing?

re you saying that numba-cuda doesn't link the NRT when it should? If so, then I would have seen that as a bug in numba-cuda?

Yes, this is the case. It is the PTX file that is provided by cuDF that is not checked. The PTX from the numba side is checked, but doesn't contain the actual NRT calls - it only contains a call to a shim function that uses NRT functions within its implementation.

Initially I felt that since cuDF was somewhat burying the NRT calls maybe it should also be the one to signal to numba that the NRT implementations are also needed (or the same for any consuming library that does the same thing).

But I also see that it could be considered a bug in numba-cuda that all the input PTX files are not checked for NRT functions. What do you think?

I think if we support using the NRT functions from external code, then ideally we should be checking for calls to NRT functions in external code rather than asking the user to do additional work - particularly if it requires signalling in the @cuda.jit decorator, because the person writing the jitted function can often be someone different to the person who implemented another library.

We've seen this problem with the link and extensions kwargs already, where libraries want to use these features, but then have to ask the users to remember to use them with no explicit connection between use of the library and the requirement to add the kwargs.

What do you think of the changes in 0b23a09 and dd568e7 ? Here we drop the kwarg and scan the ptx files at the expense of a double read of the same file, something theres probably an easy solution for.

Apologies for being late to the discussion.

particularly if it requires signalling in the @cuda.jit decorator, because the person writing the jitted function can often be someone different to the person who implemented another library.

If it's the library dev duty to specify nrt arg, how about adding it to the LinkableCode interface? The downside is that this cannot be specified if user would want to directly use a path to the link argument. Perhaps we can make that an unsupported case or simply deprecate passing in a raw path (and reintroduce it as a Path subclass to LinkableCode).

What let me to think of this is that 0b23a09 NRT discovery mechanism may not be able to handle binary format input such as LTOIR (disassembly could work but I think that's a too much price to pay).

I increasingly see that Numba as a compiler need to separate the duty from lib dev and lib user (kernel writer). And LinkableCode interface can be the perfect vehicle to separate those needs.

If it's the library dev duty to specify nrt arg, how about adding it to the LinkableCode interface?

I think this sounds like a good idea.

The downside is that this cannot be specified if user would want to directly use a path to the link argument.

I think that's not too significant a downside. I see the provision of a path to link as a bit of a legacy anyway, because it's not very convenient - it's just what happened to be implemented first, a long time ago, perhaps before the API design had a lot of consideration.

Perhaps we can make that an unsupported case or simply deprecate passing in a raw path (and reintroduce it as a Path subclass to LinkableCode).

I think introducing a Path subclass might not fit exactly right, because the linker needs to know what type of object is being passed in. A helper function that determines the type based on the extension, reads it in, and then returns the appropriate LinkableCode subclass instance would probably be OK though. (Perhaps a class method LinkableCode.from_path(), maybe?).

(Perhaps a class method LinkableCode.from_path(), maybe?).

I think that's a good design.

gmarkall · 2025-03-19T10:26:11Z

Would this be better broken into two PRs? One for making the NRT functions available externally, and one for the kwarg?

gmarkall · 2025-03-19T10:27:06Z

Also, if we're going to make nrt.cuh available for external users, I think we need a documentation section explaining it and outlining what we consider the public API and expected behaviour to be.

brandon-b-miller · 2025-03-19T20:22:44Z

Also, if we're going to make nrt.cuh available for external users, I think we need a documentation section explaining it and outlining what we consider the public API and expected behaviour to be.

I agree. I think that given that the numpy API is available now we're getting to the point where a doc update is needed to describe the available NRT features. What would you say to giving that section of the docs a broad overhaul, either as part of this PR or separately, including what nrt.cuh is for and how to use it?

numba_cuda/numba/cuda/cudadrv/driver.py

isVoid · 2025-04-02T05:07:57Z

numba_cuda/numba/cuda/dispatcher.py

+        link_nrt = nrt_in_asm(asm)
+        if not link_nrt:
+            for file in link:
+                if file.endswith('ptx'):


Alternatively, there can also be linkable_code.PTXSource in the input. Could there be input in other format that also contains NRT calls?

Thats a reasonable question. This would only currently catch the case where the user supplied a PTX file that utilized NRT functions. I suppose it could also occur with a .cu file, which it shouldn't be hard to update things for.

I think it should just be {".cu", ".ptx"} since that's all we accept in link=

numba_cuda/numba/cuda/tests/nrt/test_nrt.py

Co-authored-by: Michael Wang <13521008+isVoid@users.noreply.github.com>

gmarkall

The implementation of add_data() (#235) should probably be fixed within this PR in ctypes and cuda-python linkers, because without that fix the additional tests fail, e.g.:

======================================================================
ERROR: test_nrt_detect_linkable_code (numba.cuda.tests.nrt.test_nrt.TestNrtLinking.test_nrt_detect_linkable_code) (code=<numba.cuda.cudadrv.linkable_code.Cubin object at 0x706fbc9763d0>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/tests/nrt/test_nrt.py", line 204, in test_nrt_detect_linkable_code
    kernel[1, 1]()
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 737, in __call__
    return self.dispatcher.call(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 883, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 891, in _compile_for_args
    return self.compile(tuple(argtypes))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1159, in compile
    kernel.bind()
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 375, in bind
    cufunc = self._codelibrary.get_cufunc()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 260, in get_cufunc
    cubin = self.get_cubin(cc=device.compute_capability)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 237, in get_cubin
    self._link_all(linker, cc, ignore_nonlto=False)
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 204, in _link_all
    linker.add_file_guess_ext(path, ignore_nonlto)
  File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/cudadrv/driver.py", line 2886, in add_file_guess_ext
    self.add_data(
    ^^^^^^^^^^^^^
AttributeError: 'CtypesLinker' object has no attribute 'add_data'

I haven't looked into why this is not also failing on CI, but I think it ought to also be checked on CI.

numba_cuda/numba/cuda/dispatcher.py

gmarkall · 2025-05-03T12:10:14Z

I just merged main so this branch can pick up the recent CI fixes.

numba_cuda/numba/cuda/tests/test_binary_generation/generate_raw_ltoir.py

numba_cuda/numba/cuda/tests/test_binary_generation/Makefile

gmarkall · 2025-05-03T12:36:50Z

numba_cuda/numba/cuda/dispatcher.py

+        detect .extern .func decls of NRT functions in any file in the link or
+        any of the passed LinkableCode objects have NRT enabled. In the special
+        case of PTXSource or CUSource objects, the source code will be inspected
+        even if the NRT flag is not set.


Why are we inspecting the source of PTX and CUDA C/C++ source files? Shouldn't the user be expected to explicitly state that they require NRT by setting the LinkableCode nrt property? There's a couple of reasons I think it would be better to require this:

I think a while back we concluded it would be hard to perfectly determine whether arbitrary linkable code used the NRT (potential false positives and false negatives), so we shouldn't try.

A user might pass in an object as PTX or CUDA source, and rely on NRT getting linked in automatically. If they then decide to switch to LTO-IR for better performance, the automatic linking will no longer work, and their kernels will appear to suddenly have unresolved NRT symbols.

I am quite +1 on requiring the user the be explicit about when they need NRT and less autodetection. Does the following seem right then? We'd link NRT if:

it is globally enabled

And either:

NRT is autodetected in the main asm (this is the only thing we try and autodetect NRT symbols in)

a LinkableCode object with NRT specifically enabled is passed

If a user passes a .cu or .ptx file (in any form, either as a path or as a LinkableCode instance) without NRT explicitly set, it may fail to link if the code does contain NRT symbols.

Yes, that sounds good. I'm trying to think of a way to remove the autodetection even for internal code, too (similar to how #240 should be able to remove it for CGs) but it's a little trickier with it coming from Numba core as well.

Should be addressed in ff12e16

I think this is addressed from a technical standpoint (modulo the minor comment below) but the docstring here is now out of date, suggesting that PTX and CU sources will be inspected.

Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>

numba_cuda/numba/cuda/dispatcher.py

gmarkall

I think this is generally looking good - just a couple of minor comments on the diff.

I tried merging in #241 locally and it got the tests passing for me.

Once #241 is in and the comments are addressed, I think this will be good to merge.

brandon-b-miller added 5 commits March 7, 2025 08:38

initial

2f62cb1

pass compute sanitizer on the python side

ec6bf1e

clean

f52a15c

merge/resolve

4df687b

reset files

01be5c2

gmarkall added the 2 - In Progress Currently a work in progress label Mar 19, 2025

gmarkall reviewed Mar 19, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels Mar 19, 2025

brandon-b-miller added 2 commits March 19, 2025 12:57

check all ptx files in maybe_link_nrt

0b23a09

detect nrt in all linked ptx files

dd568e7

brandon-b-miller added 4 commits March 24, 2025 05:58

read_file

af9045d

tests, fill out headerA

f2e9b29

merge/resolve

e4008d0

temp file

e65088f

brandon-b-miller added 3 - Ready for Review Ready for review by team 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 31, 2025

isVoid reviewed Apr 2, 2025

View reviewed changes

brandon-b-miller and others added 6 commits April 2, 2025 07:35

Merge branch 'main' into support-cudf-strings

e1e6cc1

move and rename read_file

5bd9b7a

Update numba_cuda/numba/cuda/tests/nrt/test_nrt.py

e3042a6

Co-authored-by: Michael Wang <13521008+isVoid@users.noreply.github.com>

get_include() in tests

501b4da

handle LinkableCode

ff478f4

convert to LinkableCode

43b647b

brandon-b-miller requested review from gmarkall and isVoid April 7, 2025 15:48

gmarkall requested changes Apr 30, 2025

View reviewed changes

gmarkall reviewed Apr 30, 2025

View reviewed changes

numba_cuda/numba/cuda/dispatcher.py Show resolved Hide resolved

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Apr 30, 2025

brandon-b-miller and others added 8 commits April 30, 2025 05:51

Merge branch 'main' into support-cudf-strings

47a45b4

partially address reviews

34b710d

go back to regex check

bd3da22

small fix

dc9680b

preserve nrt library through serialization

71c280b

pass full test suite with nrt and nvidia binding

eaebb9a

merge/resolve

d8469de

Merge remote-tracking branch 'NVIDIA/main' into support-cudf-strings

dd35975

gmarkall reviewed May 3, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/test_binary_generation/generate_raw_ltoir.py Outdated Show resolved Hide resolved

gmarkall reviewed May 3, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/test_binary_generation/Makefile Outdated Show resolved Hide resolved

gmarkall reviewed May 3, 2025

View reviewed changes

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels May 3, 2025

brandon-b-miller requested a review from gmarkall May 6, 2025 12:15

brandon-b-miller and others added 2 commits May 6, 2025 07:16

drop autodetection of NRT in user supplied raw file paths

ff12e16

Apply suggestions from code review

d893d2b

Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>

gmarkall reviewed May 7, 2025

View reviewed changes

numba_cuda/numba/cuda/dispatcher.py Outdated Show resolved Hide resolved

gmarkall requested changes May 7, 2025

View reviewed changes

brandon-b-miller added 4 commits May 7, 2025 10:10

Merge branch 'main' into support-cudf-strings

e8659aa

simplify and update docstring

ad0a0a7

small bugfix

19c6a56

reoder steps in final CI script

50b61ac

gmarkall approved these changes May 8, 2025

View reviewed changes

gmarkall merged commit bbfce31 into NVIDIA:main May 8, 2025
37 checks passed

brandon-b-miller mentioned this pull request May 9, 2025

Bump version to 0.11.0 #250

Merged

Conversation

brandon-b-miller commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isVoid Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Mar 19, 2025

Uh oh!

gmarkall commented Mar 19, 2025

Uh oh!

brandon-b-miller commented Mar 19, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gmarkall left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gmarkall commented May 3, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brandon-b-miller commented Mar 19, 2025 •

edited

Loading

brandon-b-miller Mar 19, 2025 •

edited

Loading

isVoid Apr 9, 2025 •

edited

Loading

gmarkall left a comment •

edited

Loading