Split RAPIDS C++ conda libraries into standardized components #46

vyasr · 2024-04-19T00:03:16Z

Currently most RAPIDS C++ libraries produce a single lib*.so that represents the complete output of the C++ library. There are additional conda packages produced for things like tests, benchmarks, and examples, but the core libraries are contained in a single conda package. While this has historically been fine, we are now seeing increased usage of RAPIDS libraries as dependencies of other libraries, both internally (e.g. cugraph-ops and cumlprims_mg are primarily consumed as dependencies of cugraph and cuml, respectively) and externally (raft is being increasingly used by vector dbs etc). Moreover, we are seeing the potential for static RAPIDS libraries.

Our current package structure is not well suited to handle all of these uses. The lack of separation between runtime and build-time packages means that build-time dependencies are often propagated unnecessarily, bloating runtime environments and making conda solves more complex than they need to be. Additionally, not having standardized package delineations puts a greater onus on downstream developers to know which packages to include in what parts of the recipes, which in turn often leads to misconfigured recipes down the line that cause additional issues. As our conda environments become more and more complex, having packages configured correctly is critical to reducing the number of issues we run into. Some packages (especially raft) have started to address some of these concerns piecemeal to fix specific use cases, but I think now would be a good time for us to consider adopting a more holistic strategy here.

To better address these diverse use cases, we should migrate all RAPIDS packages to offer a more standardized set of packages.
The most common case will be RAPIDS libraries that produce two different conda packages:

${lib}: The base package would only contain the shared library, basically the minimal runtime requirement for any other package that depends on this library. For example, libcuml would have a runtime dependency on libraft because it needs libraft functions at runtime.
${lib}-dev: *-dev packages should include everything required to build against the library. ${lib}-dev should include a runtime dependency on ${lib} so that the library can be linked to. It should also include a runtime dependency on anything required to build against the package since this package will only be installed with the intent of building against the library. In addition, the dev package should include header files required to compile code that uses the package as well as any packaging files like CMake config files (for now we don't produce e.g. pkgconfig files, but such things would also go in this library if we did). The ${lib}-dev package should include a run export of ${lib}, which ensures that any package that builds against ${lib}-dev will automatically have ${lib} added to its list of runtime dependencies. Typically Python packages will consume the dev version of the C++ package.

For most libraries, the above two will be sufficient. In cases where RAPIDS libraries also want to offer a static component, we will also want to produce

${lib}-static: static packages will contain the static library. If a static package exists for a given library, then the corresponding dev package should include a run_constrained specification section so that the dev package and the static package require installing consistent versions.

Header-only libraries

Some RAPIDS libraries are header-only (rmm) or offer a header-only component (raft). This introduces an additional layer of complexity. I do not know if there is a standard for this, so please comment if there is one that we should follow. If not, I would propose the following layout:

${lib}-headers: This package should exist only for packages that support header-only usage. This package should include all header files and have a runtime dependency for every other package that is required to build against these headers. It should also include CMake config files so that the headers can be found by CMake. If a headers package exists, the dev package should depend on the headers package. In most cases, ${lib}-dev will likely just be a metapackage that pulls in ${lib} and ${lib}-headers. There may need to be some additional CMake files to stitch together the headers with the runtime libs. The header package should not include a run export on the corresponding lib since the presumption is that this package should only be pulled for header-only usage.

Additional considerations

raft currently produces an additional package libraft-headers-only. The purpose of this package is to allow consumers of the raft headers to include and use a limited subset of raft that does not require CUDA math libraries. I do not think that this is a standard use case that we'll need to support more generally. However, if we were to support this kind of usage, I would probably argue for modifying the package so that libraft-headers-only only contained the headers that are actually consumable without CUDA math libraries. Currently I believe that it includes all headers, so it is the user's responsibility to only use the headers that don't require CUDA libs (or to manually install CUDA libs).

The text was updated successfully, but these errors were encountered:

bdice mentioned this issue Sep 9, 2024

Remove run-export of librmm. rapidsai/rmm#1673

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split RAPIDS C++ conda libraries into standardized components #46

Split RAPIDS C++ conda libraries into standardized components #46

vyasr commented Apr 19, 2024

Split RAPIDS C++ conda libraries into standardized components #46

Split RAPIDS C++ conda libraries into standardized components #46

Comments

vyasr commented Apr 19, 2024

Header-only libraries

Additional considerations