Skip to content

[WIP] Use gcc-toolset to build libcuvs for portability across libstdc++.so minor versions#1264

Closed
mythrocks wants to merge 6 commits intorapidsai:branch-25.10from
mythrocks:portable-gcc-toolset-build-wip
Closed

[WIP] Use gcc-toolset to build libcuvs for portability across libstdc++.so minor versions#1264
mythrocks wants to merge 6 commits intorapidsai:branch-25.10from
mythrocks:portable-gcc-toolset-build-wip

Conversation

@mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Aug 15, 2025

Fixes #1248.
Depends on #1296.

This commit adds scripts to use gcc-toolset to build libcuvs (and librmm.so, libcuvs_c.so, etc.), so that the libraries are immune to minor ABI differences in libstdc++.so between the build host and the deployment hosts.

Prior to this commit, the minor ABI differences would manifest as follows:

$ git:(branch-25.10) ldd ./libcuvs.so                                                                                                          
./libcuvs.so: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./libcuvs.so)
        linux-vdso.so.1 (0x00007ef0ff6a1000)
        libnccl.so.2 => /home/mithunr/workspace/dev/cuvs/5/cpp/build/consolidated/./libnccl.so.2 (0x00007ef0a1000000)
         ...

With this change, the libraries should be portable across Linux distros, without also having to ship libstdc++ (as is done with the Conda-based releases).

Libraries built this way should be shippable via tarballs or packaged inside a fat jar.

Rather than have the gcc-toolset installed on the host machine, this commit introduces a new Dockerfile with the requisite packages used for building the portable libraries.

This is work-in-progress for getting libcuvs to build so that it's
portable across minor versions of libstdc++.

This currently works with libcuvs, java builds.

This change doesn't yet tackle the fat-jar build.

Signed-off-by: MithunR <mithunr@nvidia.com>
Also corrected the copyright dates.

Signed-off-by: MithunR <mithunr@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 15, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mythrocks mythrocks added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Aug 15, 2025
Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks
Copy link
Contributor Author

Tagging @jameslamb for his expertise. (Thank you for explaining the CI end of this to me.)

It looks like the CI scripts already use a pre-built container that covers a fair bit of what we have here.

Per @jameslamb, we should be able to reuse the CI docker container, and use the gcc-toolset-14 there to build the Java artifacts, and store them in Github's artifact storage.

We'll do a fresh draft PR that leverages the changes here to try that approach.

cc @cjnolet, @mmccarty.

@jameslamb
Copy link
Member

jameslamb commented Sep 2, 2025

Per @jameslamb, we should be able to reuse the CI docker container, and use the gcc-toolset-14 there to build the Java artifacts, and store them in Github's artifact storage.

Specifically, I think we can use the rapidsai/ci-wheel images which are based on Rocky Linux 8 and already use gcc-toolset-14: https://github.com/rapidsai/ci-imgs/blob/8a2f191c7605297ac0018307ffce9e1d0a2ade7b/ci-wheel.Dockerfile#L157

Those come with most of what's already needed, and by definition support the same range of GLIBC versions (and therefore Linux distributions) that the rest of RAPIDS does.

mythrocks added a commit to mythrocks/cuvs that referenced this pull request Sep 3, 2025
Note that this has to assume a docker build (a la rapidsai#1264).

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit to mythrocks/cuvs that referenced this pull request Sep 4, 2025
This commit adds the scripts from rapidsai#1264 to this PR.

This change was precipitated by the inclusion of librapids_logger.so
in the fat-jar.
With the docker-based build, the rapids-logger library is compiled
and made available in cpp/build/cudaxx/.  The pom.xml can then pick
up the library for inclusion.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks
Copy link
Contributor Author

At this point, #1296 and #1264 are intertwined and co-dependent. I've moved these changes to #1296.

Closing, in favour of #1296.

rapids-bot bot pushed a commit that referenced this pull request Sep 10, 2025
…luded (#1296)

This commit introduces an _option_ to include the native libraries as part of a new Java JAR artifact.

In addition, this commit also adds scripts to build the libraries included in the fat-jars using `gcc-toolset`, to allow the libraries to be portable across several Linux / `libstdc++` versions.  (This was earlier attempted in #1264, but will now reside in this commit.)

Note that for the initial cut, the "fat" jars will include only the following libraries:
1. `libcuvs.so`
2. `libcuvs_c.so`
3. `librmm.so`
4. `librapids_logger.so`

The resultant JARs will still be dependent on `LD_LIBRARY_PATH` for other dependencies (`cublas`, `cusparse`, `cusolver`, `nccl`, etc.).  

Two new profiles have been introduced in the `pom.xml`:
1. `x86_64-cuda12`
5. `x86_64-cuda13` (Although this is more of an example than anything.)

The main JAR artifact (`cuvs-java-25.x.x.jar`) remains unmodified.  But when `-P x86_64-cuda12` is employed, an additional `cuvs-java-25.x.x-x86_64-cuda12.jar` is produced, containing `libcuvs.so`, `libcuvs_c.so`, and some (minimal) additional dependencies.

The idea is that the JAR artifact build for `x86_64` + `cuda12` would look something like:
```bash
# On an x86 build box, using the cuda-12 conda env, from the project root directory:

# Build `libcuvs.so` first.
LIBCUVS_BUILD_DIR=`pwd`/cpp/build/cuda12 ./build.sh libcuvs

# Now the Java build.
cd java/
CMAKE_PREFIX_PATH=`pwd`/../cpp/build/cuda12 ./build.sh
```

The `java/build.sh` detects the CPU platform and the CUDA version to automatically choose the profile (`x86-cuda12`).

Note that there are tangential changes to the `pom.xml`:
1. Fixes #1293: The Java 22 portion of the build will now follow the Java 21, to prevent races between the builds.
2. The `maven-compiler-plugin` version has been dropped to `3.11.0`, to prevent build errors regarding a "0-byte module-info.class".  This is apparently a known issue in `3.13`, to be fixed in `3.14`.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Lorenzo Dematté (https://github.com/ldematte)
  - Mike Sarahan (https://github.com/msarahan)
  - Ben Frederickson (https://github.com/benfred)

URL: #1296
@mythrocks mythrocks closed this Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build C++ improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

[TASK][Java] Use gcc-toolset to build libcuvs

2 participants