Skip to content

[REVIEW] [Java] Option to build fat-jars with native dependencies included#1296

Merged
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.10from
mythrocks:fat-jar-fixed-compile
Sep 10, 2025
Merged

[REVIEW] [Java] Option to build fat-jars with native dependencies included#1296
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.10from
mythrocks:fat-jar-fixed-compile

Conversation

@mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Aug 27, 2025

This commit introduces an option to include the native libraries as part of a new Java JAR artifact.

In addition, this commit also adds scripts to build the libraries included in the fat-jars using gcc-toolset, to allow the libraries to be portable across several Linux / libstdc++ versions. (This was earlier attempted in #1264, but will now reside in this commit.)

Note that for the initial cut, the "fat" jars will include only the following libraries:

  1. libcuvs.so
  2. libcuvs_c.so
  3. librmm.so
  4. librapids_logger.so

The resultant JARs will still be dependent on LD_LIBRARY_PATH for other dependencies (cublas, cusparse, cusolver, nccl, etc.).

Two new profiles have been introduced in the pom.xml:

  1. x86_64-cuda12
  2. x86_64-cuda13 (Although this is more of an example than anything.)

The main JAR artifact (cuvs-java-25.x.x.jar) remains unmodified. But when -P x86_64-cuda12 is employed, an additional cuvs-java-25.x.x-x86_64-cuda12.jar is produced, containing libcuvs.so, libcuvs_c.so, and some (minimal) additional dependencies.

The idea is that the JAR artifact build for x86_64 + cuda12 would look something like:

# On an x86 build box, using the cuda-12 conda env, from the project root directory:

# Build `libcuvs.so` first.
LIBCUVS_BUILD_DIR=`pwd`/cpp/build/cuda12 ./build.sh libcuvs

# Now the Java build.
cd java/
CMAKE_PREFIX_PATH=`pwd`/../cpp/build/cuda12 ./build.sh

The java/build.sh detects the CPU platform and the CUDA version to automatically choose the profile (x86-cuda12).

Note that there are tangential changes to the pom.xml:

  1. Fixes [BUG] [Java] Race condition between building src/main/java and src/main/java22 #1293: The Java 22 portion of the build will now follow the Java 21, to prevent races between the builds.
  2. The maven-compiler-plugin version has been dropped to 3.11.0, to prevent build errors regarding a "0-byte module-info.class". This is apparently a known issue in 3.13, to be fixed in 3.14.

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks self-assigned this Aug 27, 2025
@mythrocks mythrocks requested a review from a team as a code owner August 27, 2025 18:49
@mythrocks mythrocks added the feature request New feature or request label Aug 27, 2025
@mythrocks mythrocks requested a review from a team as a code owner August 27, 2025 18:49
@mythrocks mythrocks added the Java label Aug 27, 2025
@mythrocks mythrocks requested a review from msarahan August 27, 2025 18:49
@mythrocks mythrocks marked this pull request as draft August 27, 2025 18:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mythrocks mythrocks added the non-breaking Introduces a non-breaking change label Aug 27, 2025
@mythrocks
Copy link
Contributor Author

Note that the work in #1264 will depend on these changes.

<!-- Include native libraries from separate directory -->
<fileSets>
<fileSet>
<directory>${project.build.directory}/native-libs</directory>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the assembly file that packages native-libraries into the native-jar artifact.

final class JDKProvider implements CuVSProvider {

static {
OptionalNativeDependencyLoader.loadLibraries();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the call into the optional native-dependency loader.

If the jar includes native dependency libraries, they will be loaded at startup. If not, the load is skipped, and it runs as before (i.e. depending on $LD_LIBRARY_PATH, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to pre-load specific paths? With the conda-pack option, we'd need to force pre-load the bundled libstdc++ library. I'm mentioning this because this looks like it might be the right place to do that pre-loading. If you just rely on LD_LIBRARY_PATH, you would likely end up with the system libstdc++, which will be too old for the conda packages.

@cjnolet cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Sep 5, 2025
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concentrated my review and comments on the Java part, but I've also tried build-in-docker with success. LGTM

mvn verify "${MAVEN_VERIFY_ARGS[@]}" \
mvn clean verify "${MAVEN_VERIFY_ARGS[@]}" -P "$BUILD_PROFILE" \
&& mvn install:install-file -Dfile=./target/cuvs-java-$VERSION.jar -DgroupId=$GROUP_ID -DartifactId=cuvs-java -Dversion=$VERSION -Dpackaging=jar \
&& mvn install:install-file -Dfile=./target/cuvs-java-$VERSION-"$BUILD_PROFILE".jar -DgroupId=$GROUP_ID -DartifactId=cuvs-java -Dversion=$VERSION -Dclassifier="$BUILD_PROFILE" -Dpackaging=jar \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, and thank you for keeping the slim jar too!

/**
* A class that loads native dependencies if they are available in the jar.
*/
public class OptionalNativeDependencyLoader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting what we discussed offline and expanding it:
I like the idea; as you see, I've gone down a very similar route in #1316 to check if libcuvs can be loaded, and if not why, before proceeding.
I think it would be great if we can merge the efforts, and have an OptionalNativeDependencyLoader that works for both cases (either a conditional implementation -- we can introspect the jar to see if it's the slim or fat version, or with 2 different implementations).

I think we should merge this as-is; I have verified that with these changes the slim jar keeps working and nothing breaks. However, if you don't mind, I have opened an issue to keep track of the follow up work, so we can have both this loader (which is necessary for the fat-jar to work) and another similar to/derived from #1316 for the slim jar (which is necessary for the slim-jar to work).

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks
Copy link
Contributor Author

I'm considering merging this tomorrow, unless @ldematte and @jameslamb have objections. I think I've addressed the concerns we had for the moment.

@jameslamb
Copy link
Member

unless @ldematte and @jameslamb have objections

Sorry, I'm not able to review this soon. Please talk with @robertmaynard about it, and don't wait on me.

Some quick notes on what I see:

  • nothing in the docker-build/ directory added here appears to being run in CI... and I think you'll have a hard time doing docker-in-docker when you try to get that running in CI in future PRs
  • I still feel that it'd be easier to maintain and produce more-portable packages to instead use the existing Rocky Linux 8 images the rest of RAPIDS uses to build Python wheels, as I started prototyping in [DO NOT MERGE] cuvs-java: fat jars with libcuvs.so (proof-of-concept) #1311

@mythrocks
Copy link
Contributor Author

mythrocks commented Sep 8, 2025

Thank you, @jameslamb. I'll wait to see if @robertmaynard has any objections to the contents of docker-build/. (The other stuff, I'll leave in @ldematte's able hands.)

Re docker-in-docker: ACK. That will need resolving, but I agree with your assessment: It would be good to bake this into the CUVS CI container. Devs could then use that. I'll pick that thread up in a couple of weeks.

@ldematte
Copy link
Contributor

ldematte commented Sep 9, 2025

I'm considering merging this tomorrow, unless @ldematte and @jameslamb have objections. I think I've addressed the concerns we had for the moment.

On the strictly Java (non-docker) part: GTG for me. We have plans on how extend it, but we'll address it separately after this and #1314 are merged.

@robertmaynard
Copy link
Contributor

Some follow-up work yall should think about is to update the C++ build flags used once #1317 is merged. That would allow you to build rmm/logger/fmt/spdlog statically and only have to bundle libcuvs.so and libcuvs_c.so

@mythrocks
Copy link
Contributor Author

That would allow you to build rmm/logger/fmt/spdlog statically and only have to bundle libcuvs.so and libcuvs_c.so

Thank you for the heads-up. Agreed. That's would be to C++ libs what the fat-jar is for Java.

Come to that, I'm not quite sure why we have a separate libcuvs.so and a libcuvs_c.so. The latter can't be used without the former.

This is worth investigating further.

Plus, corrected project name.

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Comment on lines +14 to +15
CUDA_VERSION_FROM_NVCC=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}')
CUDA_MAJOR_VERSION=${CUDA_VERSION_FROM_NVCC:-12}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CUDA_VERSION_FROM_NVCC=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}')
CUDA_MAJOR_VERSION=${CUDA_VERSION_FROM_NVCC:-12}
CUDA_MAJOR_VERSION=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}')

Any reason to have two variables here? My inclination would be to error out if the inner expression fails. The pipefail setting at the top of the script takes care of propagating the status.

Comment on lines +32 to +34
ARG CUDA_VERSION=12.9.1
ARG OS_RELEASE=9
ARG TARGETPLATFORM=linux/amd64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARG CUDA_VERSION=12.9.1
ARG OS_RELEASE=9
ARG TARGETPLATFORM=linux/amd64
ARG CUDA_VERSION
ARG OS_RELEASE
ARG TARGETPLATFORM

Redeclaring doesn't need default values

final class JDKProvider implements CuVSProvider {

static {
OptionalNativeDependencyLoader.loadLibraries();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to pre-load specific paths? With the conda-pack option, we'd need to force pre-load the bundled libstdc++ library. I'm mentioning this because this looks like it might be the right place to do that pre-loading. If you just rely on LD_LIBRARY_PATH, you would likely end up with the system libstdc++, which will be too old for the conda packages.

ARG CCACHE_VERSION=4.11.2

# Default x86_64 from x86 build, aarch64 cmake for arm build
ARG CMAKE_ARCH=x86_64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to consume docker platform, so that thing are not defined in multiple places and get out of sync. Code generated by cursor.

Suggested change
ARG CMAKE_ARCH=x86_64
# Extract architecture part (remove linux/ prefix and any variant suffix)
local arch="${TARGETPLATFORM#linux/}"
arch="${arch%%/*}" # Remove variant suffix like /v8, /v1, etc.
# Convert to CMake platform naming
case "$arch" in
"amd64"|"x86_64")
echo "x86_64"
;;
"arm64"|"aarch64")
echo "aarch64"
;;
*)
echo "Error: Unsupported architecture '$arch' in platform '$platform'" >&2
echo "Supported architectures: amd64, x86_64, arm64, aarch64" >&2
return 1
;;
esac```

fi

if [ "$(uname -m)" == "aarch64" ]; then
DOCKER_BUILD_EXTRA_ARGS=(--build-arg TARGETPLATFORM=linux/arm64 --build-arg CMAKE_ARCH=aarch64 "${DOCKER_BUILD_EXTRA_ARGS[@]}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be cleaned up to one line with the code above to translate TARGETPLATFORM to CMAKE_ARCH. I would not base the TARGETPLATFORM on the host CPU type. Better to pass it in as a parameter/env var so you don't need specific hardware to build a docker image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point. Cross-compilers would break this.

I had to start somewhere.

@mythrocks
Copy link
Contributor Author

@msarahan Would it be alright if we left further changes to this in a separate follow up PR? I'm afraid I'm away, and will be unavailable for the next 3 weeks. There are PRs queued up behind this one.

@mythrocks
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 60244ca into rapidsai:branch-25.10 Sep 10, 2025
156 of 164 checks passed
rapids-bot bot pushed a commit that referenced this pull request Sep 15, 2025
…to UnsupportedProvider/UnsupportedOperationExceptions (#1316)

This PR further extends #1296 and #1314 to give meaningful error messages in case libcuvs fails to load.

The `jextract` generated bindings we use in cuvs-java use `SymbolLookup#libraryLookup` to load the `cuvs_c` dynamic library; this uses `RawNativeLibraries#load` (see https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjava/RawNativeLibraries.c#L58); `RawNativeLibraries#load` in turn calls `JVM_LoadLibrary`. 

`JVM_LoadLibrary` does a good job to put together a good error message (e.g. calling `dlerror`, trying to locate and inspect the file for platform mismatch, etc. Unfortunately, `RawNativeLibraries#load` calls it passing false to the `throwException` parameter, which means that the detailed error messages are not surfaced.

This PR follows the pattern introduced in #1296 and preloads libcuvs (and dependencies) using `JVM_LoadLibrary` directly with `throwException` true; preloading it will also cause the OS to look for and load all dependencies. 
In case of error we can see what's broken in better detail; e.g. if `libcuvs_c.so` is present, but `librmm.so` is missing:

```
java.lang.UnsupportedOperationException: cannot create JDKProvider: libcuvs_c.so: librmm.so: cannot open shared object file: No such file or directory
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.spi.UnsupportedProvider.newCuVSResources(UnsupportedProvider.java:35)
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:90)
        at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:79)
```

Fixes #1321

Authors:
  - Lorenzo Dematté (https://github.com/ldematte)

Approvers:
  - Ishan Chattopadhyaya (https://github.com/chatman)
  - Chris Hegarty (https://github.com/ChrisHegarty)
  - MithunR (https://github.com/mythrocks)

URL: #1316
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build feature request New feature or request Java non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

[BUG] [Java] Race condition between building src/main/java and src/main/java22

6 participants