Support consuming NRT functions from external libraries#167
Support consuming NRT functions from external libraries#167gmarkall merged 44 commits intoNVIDIA:mainfrom
Conversation
numba_cuda/numba/cuda/dispatcher.py
Outdated
|
|
||
| def maybe_link_nrt(self, link, tgt_ctx, asm): | ||
| if not tgt_ctx.enable_nrt: | ||
| if not tgt_ctx.enable_nrt or self.nrt is False: |
There was a problem hiding this comment.
Do you want the nrt kwarg for the JIT decorator to take precedence over the global NRT setting for enabling it? I think this will never link the NRT if it is globally disabled.
There was a problem hiding this comment.
Good observation. I think the best behavior might be
- kwarg takes precedence, allowing users definitive control over if the NRT code should be added to the link for a particular kernel
- global flag enables the detection mechanism
But I'm not sure if numba's code generation will actually provide increfs/decrefs without enable_nrt set to true in the target context, is that correct? In that case we might be adding the refcounting functions to the link but not actually calling them.
The motivation for this kwarg is twofold: Where it came up was an edge case I observed in cuDF where an NRT managed object is created and immediately returned, such as in lambda string: return string. In this case it works out that NRT functions are used, but they only appear in the separately provided PTX file. The ASM that numba inspects when deciding to link NRT or not contains only the declarations for the parent functions that eventually call the NRT ones, so this defeats numba's NRT detection.
Overall I feel that unless the NRT detection is absolutely definitive somehow it might be more reliable to let the user decide when to link NRT. This has the drawback of creating two switches, one to enable the possiblity of NRT, and then a second one to actually add it to the link. But even that I think its a little better than somewhat shaky auto detection.
There was a problem hiding this comment.
The ASM that numba inspects when deciding to link NRT or not contains only the declarations for the parent functions that eventually call the NRT ones, so this defeats numba's NRT detection.
Are you saying that numba-cuda doesn't link the NRT when it should? If so, then I would have seen that as a bug in numba-cuda?
Or, if it adds NRT to the asm when it's not necessary, we should be running the refcount pruning pass to elide those unnecessary NRT calls.
Is there a third situation I'm missing?
There was a problem hiding this comment.
re you saying that numba-cuda doesn't link the NRT when it should? If so, then I would have seen that as a bug in numba-cuda?
Yes, this is the case. It is the PTX file that is provided by cuDF that is not checked. The PTX from the numba side is checked, but doesn't contain the actual NRT calls - it only contains a call to a shim function that uses NRT functions within its implementation.
Initially I felt that since cuDF was somewhat burying the NRT calls maybe it should also be the one to signal to numba that the NRT implementations are also needed (or the same for any consuming library that does the same thing).
But I also see that it could be considered a bug in numba-cuda that all the input PTX files are not checked for NRT functions. What do you think?
There was a problem hiding this comment.
I think if we support using the NRT functions from external code, then ideally we should be checking for calls to NRT functions in external code rather than asking the user to do additional work - particularly if it requires signalling in the @cuda.jit decorator, because the person writing the jitted function can often be someone different to the person who implemented another library.
We've seen this problem with the link and extensions kwargs already, where libraries want to use these features, but then have to ask the users to remember to use them with no explicit connection between use of the library and the requirement to add the kwargs.
There was a problem hiding this comment.
Apologies for being late to the discussion.
particularly if it requires signalling in the @cuda.jit decorator, because the person writing the jitted function can often be someone different to the person who implemented another library.
If it's the library dev duty to specify nrt arg, how about adding it to the LinkableCode interface? The downside is that this cannot be specified if user would want to directly use a path to the link argument. Perhaps we can make that an unsupported case or simply deprecate passing in a raw path (and reintroduce it as a Path subclass to LinkableCode).
What let me to think of this is that 0b23a09 NRT discovery mechanism may not be able to handle binary format input such as LTOIR (disassembly could work but I think that's a too much price to pay).
I increasingly see that Numba as a compiler need to separate the duty from lib dev and lib user (kernel writer). And LinkableCode interface can be the perfect vehicle to separate those needs.
There was a problem hiding this comment.
If it's the library dev duty to specify
nrtarg, how about adding it to theLinkableCodeinterface?
I think this sounds like a good idea.
The downside is that this cannot be specified if user would want to directly use a path to the
linkargument.
I think that's not too significant a downside. I see the provision of a path to link as a bit of a legacy anyway, because it's not very convenient - it's just what happened to be implemented first, a long time ago, perhaps before the API design had a lot of consideration.
Perhaps we can make that an unsupported case or simply deprecate passing in a raw path (and reintroduce it as a
Pathsubclass toLinkableCode).
I think introducing a Path subclass might not fit exactly right, because the linker needs to know what type of object is being passed in. A helper function that determines the type based on the extension, reads it in, and then returns the appropriate LinkableCode subclass instance would probably be OK though. (Perhaps a class method LinkableCode.from_path(), maybe?).
There was a problem hiding this comment.
(Perhaps a class method LinkableCode.from_path(), maybe?).
I think that's a good design.
|
Would this be better broken into two PRs? One for making the NRT functions available externally, and one for the kwarg? |
|
Also, if we're going to make |
I agree. I think that given that the numpy API is available now we're getting to the point where a doc update is needed to describe the available NRT features. What would you say to giving that section of the docs a broad overhaul, either as part of this PR or separately, including what |
numba_cuda/numba/cuda/dispatcher.py
Outdated
| link_nrt = nrt_in_asm(asm) | ||
| if not link_nrt: | ||
| for file in link: | ||
| if file.endswith('ptx'): |
There was a problem hiding this comment.
Alternatively, there can also be linkable_code.PTXSource in the input. Could there be input in other format that also contains NRT calls?
There was a problem hiding this comment.
Thats a reasonable question. This would only currently catch the case where the user supplied a PTX file that utilized NRT functions. I suppose it could also occur with a .cu file, which it shouldn't be hard to update things for.
There was a problem hiding this comment.
I think it should just be {".cu", ".ptx"} since that's all we accept in link=
Co-authored-by: Michael Wang <13521008+isVoid@users.noreply.github.com>
There was a problem hiding this comment.
The implementation of add_data() (#235) should probably be fixed within this PR in ctypes and cuda-python linkers, because without that fix the additional tests fail, e.g.:
======================================================================
ERROR: test_nrt_detect_linkable_code (numba.cuda.tests.nrt.test_nrt.TestNrtLinking.test_nrt_detect_linkable_code) (code=<numba.cuda.cudadrv.linkable_code.Cubin object at 0x706fbc9763d0>)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/tests/nrt/test_nrt.py", line 204, in test_nrt_detect_linkable_code
kernel[1, 1]()
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 737, in __call__
return self.dispatcher.call(
^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 883, in call
kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 891, in _compile_for_args
return self.compile(tuple(argtypes))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1159, in compile
kernel.bind()
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 375, in bind
cufunc = self._codelibrary.get_cufunc()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 260, in get_cufunc
cubin = self.get_cubin(cc=device.compute_capability)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 237, in get_cubin
self._link_all(linker, cc, ignore_nonlto=False)
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 204, in _link_all
linker.add_file_guess_ext(path, ignore_nonlto)
File "/home/gmarkall/numbadev/numba-cuda/numba_cuda/numba/cuda/cudadrv/driver.py", line 2886, in add_file_guess_ext
self.add_data(
^^^^^^^^^^^^^
AttributeError: 'CtypesLinker' object has no attribute 'add_data'
I haven't looked into why this is not also failing on CI, but I think it ought to also be checked on CI.
|
I just merged |
numba_cuda/numba/cuda/tests/test_binary_generation/generate_raw_ltoir.py
Outdated
Show resolved
Hide resolved
numba_cuda/numba/cuda/dispatcher.py
Outdated
| detect .extern .func decls of NRT functions in any file in the link or | ||
| any of the passed LinkableCode objects have NRT enabled. In the special | ||
| case of PTXSource or CUSource objects, the source code will be inspected | ||
| even if the NRT flag is not set. |
There was a problem hiding this comment.
Why are we inspecting the source of PTX and CUDA C/C++ source files? Shouldn't the user be expected to explicitly state that they require NRT by setting the LinkableCode nrt property? There's a couple of reasons I think it would be better to require this:
- I think a while back we concluded it would be hard to perfectly determine whether arbitrary linkable code used the NRT (potential false positives and false negatives), so we shouldn't try.
- A user might pass in an object as PTX or CUDA source, and rely on NRT getting linked in automatically. If they then decide to switch to LTO-IR for better performance, the automatic linking will no longer work, and their kernels will appear to suddenly have unresolved NRT symbols.
There was a problem hiding this comment.
I am quite +1 on requiring the user the be explicit about when they need NRT and less autodetection. Does the following seem right then? We'd link NRT if:
- it is globally enabled
And either:
- NRT is autodetected in the main asm (this is the only thing we try and autodetect NRT symbols in)
- a LinkableCode object with NRT specifically enabled is passed
If a user passes a .cu or .ptx file (in any form, either as a path or as a LinkableCode instance) without NRT explicitly set, it may fail to link if the code does contain NRT symbols.
There was a problem hiding this comment.
Yes, that sounds good. I'm trying to think of a way to remove the autodetection even for internal code, too (similar to how #240 should be able to remove it for CGs) but it's a little trickier with it coming from Numba core as well.
There was a problem hiding this comment.
I think this is addressed from a technical standpoint (modulo the minor comment below) but the docstring here is now out of date, suggesting that PTX and CU sources will be inspected.
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
This PR moves some NRT function declarations to a header that can be found at build time by external libraries. It also fixes a bug where NRT wasn't linked if its declarations were present in the passed user source, but not the half of the PTX numba generates itself.