[REVIEW] [Java] Option to build fat-jars with native dependencies included#1296
Conversation
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note that the work in #1264 will depend on these changes. |
…libs. Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
| <!-- Include native libraries from separate directory --> | ||
| <fileSets> | ||
| <fileSet> | ||
| <directory>${project.build.directory}/native-libs</directory> |
There was a problem hiding this comment.
This is the assembly file that packages native-libraries into the native-jar artifact.
| final class JDKProvider implements CuVSProvider { | ||
|
|
||
| static { | ||
| OptionalNativeDependencyLoader.loadLibraries(); |
There was a problem hiding this comment.
This is the call into the optional native-dependency loader.
If the jar includes native dependency libraries, they will be loaded at startup. If not, the load is skipped, and it runs as before (i.e. depending on $LD_LIBRARY_PATH, etc.)
There was a problem hiding this comment.
Is there a way to pre-load specific paths? With the conda-pack option, we'd need to force pre-load the bundled libstdc++ library. I'm mentioning this because this looks like it might be the right place to do that pre-loading. If you just rely on LD_LIBRARY_PATH, you would likely end up with the system libstdc++, which will be too old for the conda packages.
Signed-off-by: MithunR <mithunr@nvidia.com>
ldematte
left a comment
There was a problem hiding this comment.
I have concentrated my review and comments on the Java part, but I've also tried build-in-docker with success. LGTM
| mvn verify "${MAVEN_VERIFY_ARGS[@]}" \ | ||
| mvn clean verify "${MAVEN_VERIFY_ARGS[@]}" -P "$BUILD_PROFILE" \ | ||
| && mvn install:install-file -Dfile=./target/cuvs-java-$VERSION.jar -DgroupId=$GROUP_ID -DartifactId=cuvs-java -Dversion=$VERSION -Dpackaging=jar \ | ||
| && mvn install:install-file -Dfile=./target/cuvs-java-$VERSION-"$BUILD_PROFILE".jar -DgroupId=$GROUP_ID -DartifactId=cuvs-java -Dversion=$VERSION -Dclassifier="$BUILD_PROFILE" -Dpackaging=jar \ |
There was a problem hiding this comment.
Nice, and thank you for keeping the slim jar too!
| /** | ||
| * A class that loads native dependencies if they are available in the jar. | ||
| */ | ||
| public class OptionalNativeDependencyLoader { |
There was a problem hiding this comment.
Reporting what we discussed offline and expanding it:
I like the idea; as you see, I've gone down a very similar route in #1316 to check if libcuvs can be loaded, and if not why, before proceeding.
I think it would be great if we can merge the efforts, and have an OptionalNativeDependencyLoader that works for both cases (either a conditional implementation -- we can introspect the jar to see if it's the slim or fat version, or with 2 different implementations).
I think we should merge this as-is; I have verified that with these changes the slim jar keeps working and nothing breaks. However, if you don't mind, I have opened an issue to keep track of the follow up work, so we can have both this loader (which is necessary for the fat-jar to work) and another similar to/derived from #1316 for the slim jar (which is necessary for the slim-jar to work).
java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/OptionalNativeDependencyLoader.java
Outdated
Show resolved
Hide resolved
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
|
I'm considering merging this tomorrow, unless @ldematte and @jameslamb have objections. I think I've addressed the concerns we had for the moment. |
Sorry, I'm not able to review this soon. Please talk with @robertmaynard about it, and don't wait on me. Some quick notes on what I see:
|
|
Thank you, @jameslamb. I'll wait to see if @robertmaynard has any objections to the contents of Re docker-in-docker: ACK. That will need resolving, but I agree with your assessment: It would be good to bake this into the CUVS CI container. Devs could then use that. I'll pick that thread up in a couple of weeks. |
On the strictly Java (non-docker) part: GTG for me. We have plans on how extend it, but we'll address it separately after this and #1314 are merged. |
|
Some follow-up work yall should think about is to update the C++ build flags used once #1317 is merged. That would allow you to build rmm/logger/fmt/spdlog statically and only have to bundle |
Thank you for the heads-up. Agreed. That's would be to C++ libs what the fat-jar is for Java. Come to that, I'm not quite sure why we have a separate This is worth investigating further. |
Plus, corrected project name. Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
| CUDA_VERSION_FROM_NVCC=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}') | ||
| CUDA_MAJOR_VERSION=${CUDA_VERSION_FROM_NVCC:-12} |
There was a problem hiding this comment.
| CUDA_VERSION_FROM_NVCC=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}') | |
| CUDA_MAJOR_VERSION=${CUDA_VERSION_FROM_NVCC:-12} | |
| CUDA_MAJOR_VERSION=$(nvcc --version | grep -oP 'release [0-9]+' | awk '{print $2}') |
Any reason to have two variables here? My inclination would be to error out if the inner expression fails. The pipefail setting at the top of the script takes care of propagating the status.
| ARG CUDA_VERSION=12.9.1 | ||
| ARG OS_RELEASE=9 | ||
| ARG TARGETPLATFORM=linux/amd64 |
There was a problem hiding this comment.
| ARG CUDA_VERSION=12.9.1 | |
| ARG OS_RELEASE=9 | |
| ARG TARGETPLATFORM=linux/amd64 | |
| ARG CUDA_VERSION | |
| ARG OS_RELEASE | |
| ARG TARGETPLATFORM |
Redeclaring doesn't need default values
| final class JDKProvider implements CuVSProvider { | ||
|
|
||
| static { | ||
| OptionalNativeDependencyLoader.loadLibraries(); |
There was a problem hiding this comment.
Is there a way to pre-load specific paths? With the conda-pack option, we'd need to force pre-load the bundled libstdc++ library. I'm mentioning this because this looks like it might be the right place to do that pre-loading. If you just rely on LD_LIBRARY_PATH, you would likely end up with the system libstdc++, which will be too old for the conda packages.
| ARG CCACHE_VERSION=4.11.2 | ||
|
|
||
| # Default x86_64 from x86 build, aarch64 cmake for arm build | ||
| ARG CMAKE_ARCH=x86_64 |
There was a problem hiding this comment.
Would be better to consume docker platform, so that thing are not defined in multiple places and get out of sync. Code generated by cursor.
| ARG CMAKE_ARCH=x86_64 | |
| # Extract architecture part (remove linux/ prefix and any variant suffix) | |
| local arch="${TARGETPLATFORM#linux/}" | |
| arch="${arch%%/*}" # Remove variant suffix like /v8, /v1, etc. | |
| # Convert to CMake platform naming | |
| case "$arch" in | |
| "amd64"|"x86_64") | |
| echo "x86_64" | |
| ;; | |
| "arm64"|"aarch64") | |
| echo "aarch64" | |
| ;; | |
| *) | |
| echo "Error: Unsupported architecture '$arch' in platform '$platform'" >&2 | |
| echo "Supported architectures: amd64, x86_64, arm64, aarch64" >&2 | |
| return 1 | |
| ;; | |
| esac``` |
| fi | ||
|
|
||
| if [ "$(uname -m)" == "aarch64" ]; then | ||
| DOCKER_BUILD_EXTRA_ARGS=(--build-arg TARGETPLATFORM=linux/arm64 --build-arg CMAKE_ARCH=aarch64 "${DOCKER_BUILD_EXTRA_ARGS[@]}") |
There was a problem hiding this comment.
this could be cleaned up to one line with the code above to translate TARGETPLATFORM to CMAKE_ARCH. I would not base the TARGETPLATFORM on the host CPU type. Better to pass it in as a parameter/env var so you don't need specific hardware to build a docker image.
There was a problem hiding this comment.
I agree with your point. Cross-compilers would break this.
I had to start somewhere.
|
@msarahan Would it be alright if we left further changes to this in a separate follow up PR? I'm afraid I'm away, and will be unavailable for the next 3 weeks. There are PRs queued up behind this one. |
|
/merge |
60244ca
into
rapidsai:branch-25.10
…to UnsupportedProvider/UnsupportedOperationExceptions (#1316) This PR further extends #1296 and #1314 to give meaningful error messages in case libcuvs fails to load. The `jextract` generated bindings we use in cuvs-java use `SymbolLookup#libraryLookup` to load the `cuvs_c` dynamic library; this uses `RawNativeLibraries#load` (see https://github.com/openjdk/jdk/blob/master/src/java.base/share/native/libjava/RawNativeLibraries.c#L58); `RawNativeLibraries#load` in turn calls `JVM_LoadLibrary`. `JVM_LoadLibrary` does a good job to put together a good error message (e.g. calling `dlerror`, trying to locate and inspect the file for platform mismatch, etc. Unfortunately, `RawNativeLibraries#load` calls it passing false to the `throwException` parameter, which means that the detailed error messages are not surfaced. This PR follows the pattern introduced in #1296 and preloads libcuvs (and dependencies) using `JVM_LoadLibrary` directly with `throwException` true; preloading it will also cause the OS to look for and load all dependencies. In case of error we can see what's broken in better detail; e.g. if `libcuvs_c.so` is present, but `librmm.so` is missing: ``` java.lang.UnsupportedOperationException: cannot create JDKProvider: libcuvs_c.so: librmm.so: cannot open shared object file: No such file or directory at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.spi.UnsupportedProvider.newCuVSResources(UnsupportedProvider.java:35) at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:90) at com.nvidia.cuvs@25.10.0/com.nvidia.cuvs.CuVSResources.create(CuVSResources.java:79) ``` Fixes #1321 Authors: - Lorenzo Dematté (https://github.com/ldematte) Approvers: - Ishan Chattopadhyaya (https://github.com/chatman) - Chris Hegarty (https://github.com/ChrisHegarty) - MithunR (https://github.com/mythrocks) URL: #1316
This commit introduces an option to include the native libraries as part of a new Java JAR artifact.
In addition, this commit also adds scripts to build the libraries included in the fat-jars using
gcc-toolset, to allow the libraries to be portable across several Linux /libstdc++versions. (This was earlier attempted in #1264, but will now reside in this commit.)Note that for the initial cut, the "fat" jars will include only the following libraries:
libcuvs.solibcuvs_c.solibrmm.solibrapids_logger.soThe resultant JARs will still be dependent on
LD_LIBRARY_PATHfor other dependencies (cublas,cusparse,cusolver,nccl, etc.).Two new profiles have been introduced in the
pom.xml:x86_64-cuda12x86_64-cuda13(Although this is more of an example than anything.)The main JAR artifact (
cuvs-java-25.x.x.jar) remains unmodified. But when-P x86_64-cuda12is employed, an additionalcuvs-java-25.x.x-x86_64-cuda12.jaris produced, containinglibcuvs.so,libcuvs_c.so, and some (minimal) additional dependencies.The idea is that the JAR artifact build for
x86_64+cuda12would look something like:The
java/build.shdetects the CPU platform and the CUDA version to automatically choose the profile (x86-cuda12).Note that there are tangential changes to the
pom.xml:src/main/javaandsrc/main/java22#1293: The Java 22 portion of the build will now follow the Java 21, to prevent races between the builds.maven-compiler-pluginversion has been dropped to3.11.0, to prevent build errors regarding a "0-byte module-info.class". This is apparently a known issue in3.13, to be fixed in3.14.