Skip to content

[Review][Java] Add detailed error message for libcuvs load failure to UnsupportedProvider/UnsupportedOperationExceptions#1316

Merged
rapids-bot[bot] merged 7 commits intorapidsai:branch-25.10from
ldematte:java/improve-error-messages-3
Sep 15, 2025
Merged

[Review][Java] Add detailed error message for libcuvs load failure to UnsupportedProvider/UnsupportedOperationExceptions#1316
rapids-bot[bot] merged 7 commits intorapidsai:branch-25.10from
ldematte:java/improve-error-messages-3

Conversation

@ldematte
Copy link
Contributor

@ldematte ldematte commented Sep 5, 2025

This PR further extends #1296 and #1314 to give meaningful error messages in case libcuvs fails to load.

The jextract generated bindings we use in cuvs-java use SymbolLookup#libraryLookup to load the cuvs_c dynamic library; this uses RawNativeLibraries#load (see https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjava/RawNativeLibraries.c#L58); RawNativeLibraries#load in turn calls JVM_LoadLibrary.

JVM_LoadLibrary does a good job to put together a good error message (e.g. calling dlerror, trying to locate and inspect the file for platform mismatch, etc. Unfortunately, RawNativeLibraries#load calls it passing false to the throwException parameter, which means that the detailed error messages are not surfaced.

This PR follows the pattern introduced in #1296 and preloads libcuvs (and dependencies) using JVM_LoadLibrary directly with throwException true; preloading it will also cause the OS to look for and load all dependencies.
In case of error we can see what's broken in better detail; e.g. if libcuvs_c.so is present, but librmm.so is missing:

java.lang.UnsupportedOperationException: cannot create JDKProvider: libcuvs_c.so: librmm.so: cannot open shared object file: No such file or directory
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.spi.UnsupportedProvider.newCuVSResources(UnsupportedProvider.java:35)
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:90)
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:79)

Fixes #1321

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Object lib = JVM_LoadLibrary$mh.invoke(name, true);
if (lib != null) {
// It wasn't a problem with library loading, so undo what we did
JVM_UnloadLibrary$mh.invoke(lib);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldematte nice sleuthing. Let's try to use System.localLibrary earlier so that we can get similar improved error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! Less reflection :)
A first test seems to indicate that System.loadLibrary gives a different and less precise message. I'll dig into that.
In any case, trying to load the library before seems best.

Copy link
Contributor

@chatman chatman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very useful, thanks! I was about to raise this exact change (I was trying to wrap ldd within an external process call to do this instead of using reflection, which is much better than what I was trying :-)) while I was chasing down these librmm.so and rapids_logger.so etc. loading failures. +1 to merge!

On a possibly slightly related note, I had to add the following before creating the CuVSResources to make it work. I'll raise another issue for it.

System.loadLibrary("cudart");

@ldematte
Copy link
Contributor Author

This is marked WIP as I will need to rebase this on top of #1315 #1327 for it to merge cleanly. Will be ready for review once I do that.

…e-error-messages-3

# Conflicts:
#	java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java
@ldematte ldematte changed the title [WIP][Java] Add detailed error message for libcuvs load failure to UnsupportedProvider/UnsupportedOperationExceptions [Review][Java] Add detailed error message for libcuvs load failure to UnsupportedProvider/UnsupportedOperationExceptions Sep 14, 2025
static void loadLibraries() throws ProviderInitializationException {
if (!loaded) {
try {
LOADER_STRATEGY.loadLibraries();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatman I think pre-loading libraries this way will already solve the problem with cudart, but if it does not we can add the System.loadLibrary around here (e.g. in the implementation of loadLibraries), then also call cudaRuntimeGetVersion and compare it with embeddedLibrariesCudaVersion for an additional check.
Or the other way around: use embeddedLibrariesCudaVersion to load a specific version of cudart

(All of this in a follow-up I think)

@ldematte ldematte marked this pull request as ready for review September 14, 2025 07:34
@ldematte ldematte requested a review from a team as a code owner September 14, 2025 07:34
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mythrocks
Copy link
Contributor

/ok to test 9efc1ea

@mythrocks
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 757b02a into rapidsai:branch-25.10 Sep 15, 2025
84 checks passed
@mythrocks
Copy link
Contributor

Thank you for this change, @ldematte. It's now been merged.

@ldematte ldematte deleted the java/improve-error-messages-3 branch September 16, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

[FEA] Expand/customize dependecy loading introduced in fat-jars to work with the slim-jar too

5 participants