Ensuring CTK minor version compatibility for cccl.c.parallel#4851
Merged
oleksandr-pavlyk merged 1 commit intoNVIDIA:mainfrom May 30, 2025
Conversation
nvJitLink.h header file provides unversioned inline functions which
inject versionsed symbols. For example:
```
static inline nvJitLinkResult nvJitLinkCreate(
nvJitLinkHandle *handle,
uint32_t numOptions,
const char **options)
{
return __nvJitLinkCreate_12_8 (handle, numOptions, options);
}
```
c/parallel uses unversioned symbols, but due to inlining, the
object files of each TU in c/parallel contains versioned symbols,
and hence the final shared library depends on the specific CTK
version it was built with.
The nvJitLink.so.12 does provide unversioned symbols too, which
map to versioned symbols at run-time.
```
(nvbench) opavlyk@ee09c48-lcedt:~/repos/cccl$ nm -D /usr/local/cuda/lib64/libnvJitLink.so.12 | grep nvJitLinkCreate
00000000004ba560 T nvJitLinkCreate@@libnvJitLink.so.12
00000000004ba660 T __nvJitLinkCreate_12_0@@libnvJitLink.so.12
00000000004ba670 T __nvJitLinkCreate_12_1@@libnvJitLink.so.12
00000000004ba680 T __nvJitLinkCreate_12_2@@libnvJitLink.so.12
00000000004ba690 T __nvJitLinkCreate_12_3@@libnvJitLink.so.12
00000000004ba6a0 T __nvJitLinkCreate_12_4@@libnvJitLink.so.12
00000000004ba6b0 T __nvJitLinkCreate_12_5@@libnvJitLink.so.12
00000000004ba6c0 T __nvJitLinkCreate_12_6@@libnvJitLink.so.12
00000000004ba6d0 T __nvJitLinkCreate_12_7@@libnvJitLink.so.12
00000000004ba6e0 T __nvJitLinkCreate_12_8@@libnvJitLink.so.12
```
This change replaces direct uses of `#include <nvJitLink.h>` with
`#include <nvrtc/nvjitlink_helper.h>` which defines `NVJITLINK_NO_INLINE`
before including <nvJitLink.h> and simply declares unversioned symbols.
Linking with subsequently result in using the dynamic unversioned
symbols provided by nvJitLink.so.12 library guaranteeign CTK minor
version compatibility.
I verified that NVIDIAgh-4845 is resolved with this change by installing
cuda-parallel wheel from this PR after torch built with CTK 12.8 was
installed.
```
(pathfinder-trouble) opavlyk@ee09c48-lcedt:~$ python
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:21:13) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import torch
>>> import cuda.parallel.experimental.algorithms as algorithms
>>>
>>> import ctypes
>>>
>>> lib = ctypes.cdll.LoadLibrary("libnvJitLink.so.12")
>>> lib
<CDLL 'libnvJitLink.so.12', handle 60a782c77790 at 0x76ced9bf9640>
>>> fn = lib.nvJitLinkVersion
>>> fn.restype = ctypes.c_int
>>> fn.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.POINTER(ctypes.c_int)]
>>> maj = ctypes.c_int(0)
>>> min = ctypes.c_int(0)
>>>
>>>
>>> fn(maj, min)
0
>>> maj, min
(c_int(12), c_int(8))
>>> quit()
```
Contributor
🟩 CI finished in 36m 54s: Pass: 100%/14 | Total: 2h 09m | Avg: 9m 13s | Max: 21m 06s | Hits: 96%/328
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
🏃 Runner counts (total jobs: 14)
| # | Runner |
|---|---|
| 7 | linux-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
kkraus14
approved these changes
May 29, 2025
leofang
approved these changes
May 29, 2025
rwgk
approved these changes
May 29, 2025
Contributor
rwgk
left a comment
There was a problem hiding this comment.
Looks great to me, I'm really glad that we found this simple solution.
My suggestions are minor and optional.
|
|
||
| #define NVJITLINK_NO_INLINE | ||
| #include <nvJitLink.h> | ||
| #undef NVJITLINK_NO_INLINE |
Contributor
There was a problem hiding this comment.
This #undef surprises me slightly. I'd lean towards keeping it defined:
-
Assuming
nvJitLink.his the only code from where this define is referenced: it does not matter. -
Assuming other Nvidia code also references this define: undefining here could lead to inconsistencies.
| nvJitLinkResult nvJitLinkGetErrorLog(nvJitLinkHandle, char*); | ||
| nvJitLinkResult nvJitLinkGetInfoLogSize(nvJitLinkHandle, size_t*); | ||
| nvJitLinkResult nvJitLinkGetInfoLog(nvJitLinkHandle, char*); | ||
| } |
Contributor
There was a problem hiding this comment.
I looked at what's actually being used (below): 10 of 13 APIs.
I'd probably remove the 3 we're not using, with a comment:
// nvJitLink APIs used in cccl/c/parallel
2 src/nvrtc/command_list.h: nvJitLinkAddData
1 src/nvrtc/command_list.h: nvJitLinkComplete
1 src/nvrtc/command_list.h: nvJitLinkCreate
1 src/nvrtc/command_list.h: nvJitLinkDestroy
1 src/nvrtc/command_list.h: nvJitLinkGetErrorLog
1 src/nvrtc/command_list.h: nvJitLinkGetErrorLogSize
1 src/nvrtc/command_list.h: nvJitLinkGetLinkedCubin
1 src/nvrtc/command_list.h: nvJitLinkGetLinkedCubinSize
1 src/nvrtc/command_list.h: nvJitLinkGetLinkedPtx
1 src/nvrtc/command_list.h: nvJitLinkGetLinkedPtxSize
This was referenced Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
closes gh-4845
<nvJitLink.h>header file provides unversioned inline functions which inject versionsed symbols. For example:c/parallel uses unversioned symbols, but due to inlining, the object files of each TU in c/parallel contains versioned symbols, and hence the final shared library depends on the specific CTK version it was built with.
The nvJitLink.so.12 does provide unversioned symbols too, which map to versioned symbols at run-time.
This change replaces direct uses of
#include <nvJitLink.h>with#include <nvrtc/nvjitlink_helper.h>which definesNVJITLINK_NO_INLINEbefore#include <nvJitLink.h>(thanks for the idea @leofang). It then simply declares unversioned symbols asextern "C".Linking with subsequently result in using the dynamic unversioned symbols provided by
nvJitLink.so.12shared library guaranteeing CTK minor version compatibility.I verified that gh-4845 is resolved with this change by installing cuda-parallel wheel from this PR after torch built with CTK 12.8 was installed.
Checklist