Skip to content

Fix Tests/CI, refactor merge to call CAGRA's merge(), implement CAGRA prefiltering#14

Merged
rapids-bot[bot] merged 33 commits intorapidsai:branch-25.10from
SearchScale:searchscale/merge-and-prefiltering
Sep 24, 2025
Merged

Fix Tests/CI, refactor merge to call CAGRA's merge(), implement CAGRA prefiltering#14
rapids-bot[bot] merged 33 commits intorapidsai:branch-25.10from
SearchScale:searchscale/merge-and-prefiltering

Conversation

@chatman
Copy link
Collaborator

@chatman chatman commented Aug 11, 2025

Refactoring, CI fixes (pulling libcuvs from pypi if not found), prefiltering support.

Added tests:

  • TestCuVSGaps (for missing vectors in documents)
  • TestCuVSDeletedDocuments (for deleted vectors, that will leverage prefiltering)
  • TestMerge (dedicated test for testing merges)

> Co-authored-by: Vivek Narang <vivek@searchscale.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@chatman chatman marked this pull request as ready for review August 14, 2025 19:04
@chatman chatman requested a review from a team as a code owner August 14, 2025 19:04
@narangvivek10 narangvivek10 requested a review from cjnolet August 14, 2025 19:11
@chatman
Copy link
Collaborator Author

chatman commented Aug 14, 2025

I think this is ready for review now. I feel this has many tests that will make sure we catch regressions in future PRs.

@narangvivek10
Copy link
Collaborator

narangvivek10 commented Aug 14, 2025

I have skimmed through the PR, and the changes look okay with tests passing. Will revisit this to see if any more improvements are possible.

Screenshot from 2025-08-14 15-18-59

@chatman
Copy link
Collaborator Author

chatman commented Aug 28, 2025

/ok to test 433c485

@chatman
Copy link
Collaborator Author

chatman commented Aug 28, 2025

@narangvivek10 @dantegd @cjnolet The tests got skipped in CI with the assumeTrue() thing in tests when the .so file isn't found.

2025-08-28T13:42:36.0803095Z [INFO] Running com.nvidia.cuvs.lucene.TestMerge
2025-08-28T13:42:36.6527960Z Aug 28, 2025 1:42:36 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
2025-08-28T13:42:36.6529387Z WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
2025-08-28T13:42:36.6877687Z WARNING: A restricted method in java.lang.foreign.SymbolLookup has been called
2025-08-28T13:42:36.6917748Z WARNING: java.lang.foreign.SymbolLookup::libraryLookup has been called by com.nvidia.cuvs.internal.panama.headers_h_1 in an unnamed module
2025-08-28T13:42:36.6919192Z WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
2025-08-28T13:42:36.6920265Z WARNING: Restricted methods will be blocked in a future release unless native access is enabled
2025-08-28T13:42:36.6920913Z 
2025-08-28T13:42:36.6921969Z B??r 28, 2025 9:42:36 ?NKAK?NY? com.nvidia.cuvs.lucene.CuVSVectorsFormat cuVSResourcesOrNull
2025-08-28T13:42:36.6923192Z WARNING: Exception occurred during creation of cuvs resources. java.lang.IllegalArgumentException: Cannot open library: libcuvs_c.so
2025-08-28T13:42:36.7335659Z [WARNING] Tests run: 8, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 0.635 s -- in com.nvidia.cuvs.lucene.TestMerge
2025-08-28T13:42:36.7336665Z [INFO] Running com.nvidia.cuvs.lucene.TestCuVSGaps
2025-08-28T13:42:36.7424262Z ????? ??, ???? ??:??:?? ? com.nvidia.cuvs.lucene.TestCuVSGaps afterClass
2025-08-28T13:42:36.7427095Z INFO: Test finished
  1. We need to fail the tests when .so file is not found.
  2. We need to investigate why the .so file wasn't found. Maybe cuvs-java project's CI build can offer some clues (I remember working with James Rong on this there).

@mythrocks
Copy link
Contributor

mythrocks commented Aug 28, 2025

I'm hoping the inability to find the jars will reduce somewhat, after rapidsai/cuvs#1296.

I've yet to take cuvs-lucene out for a spin with this change. I'll try get to that tonight.

@chatman chatman requested a review from a team as a code owner September 4, 2025 13:01
@chatman chatman requested a review from jameslamb September 4, 2025 13:01
@chatman
Copy link
Collaborator Author

chatman commented Sep 4, 2025

/ok to test 9498678

@narangvivek10
Copy link
Collaborator

/ok to test 4b09035

@narangvivek10
Copy link
Collaborator

/ok to test 8c8e009

@narangvivek10
Copy link
Collaborator

/ok to test c4f1abb

@narangvivek10
Copy link
Collaborator

/ok to test fa7cdb1

@narangvivek10
Copy link
Collaborator

/ok to test f4bc2e6

@chatman
Copy link
Collaborator Author

chatman commented Sep 21, 2025

@narangvivek10 I would like to avoid building libcuvs from source as that will be too slow. My preference would be to:

  1. In the short term, use the python wheels to get the .so files
  2. In the long term, use the fat jar support (github.com/[REVIEW] [Java] Option to build fat-jars with native dependencies included cuvs#1296).

For 2), I've just pushed the fat jars to searchscale maven temporarily: https://maven.searchscale.com/snapshots/com/nvidia/cuvs/cuvs-java/25.10.0-07e42-SNAPSHOT/

Can you please revert your recent changes so that we can move forward fast with the python wheels for now? If possible, please add the 12.9.1 support to the pip thing as well, based on https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libcuvs-cu12/

@mythrocks
Copy link
Contributor

Actually, yes. That would be quickest. I'm supportive of @chatman's suggestion to use pip-wheels for the short term. It is less than ideal in the long run, but it will allow faster iteration for cuvs-lucene.

@narangvivek10
Copy link
Collaborator

/ok to test 06e39dd

@cjnolet
Copy link
Member

cjnolet commented Sep 22, 2025

/ok to test 76010c0

@narangvivek10 narangvivek10 requested a review from a team as a code owner September 22, 2025 22:59
@narangvivek10
Copy link
Collaborator

/ok to test d8bb38d

@narangvivek10
Copy link
Collaborator

I have made the changes to use a cleaner approach to using libcuvs from conda instead, and I see the cuvs-lucene tests ran and succeeded as well as seen in the latest run.

FYI @cjnolet @chatman

ci/build_java.sh Outdated
rapids-print-env

# Locates the libcuvs.so file path and appends it to LD_LIBRARY_PATH
rapids-logger "Find libcuvs so file and append paths to LD_LIBRARY_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we need this anymore, do we? The conda environment should already be part of the system path, right?

Copy link
Collaborator

@narangvivek10 narangvivek10 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this because when the job starts, I saw the LD_LIBRARY_PATH set to the value seen here. After this step the LD_LIBRARY_PATH is updated to this value, where the libcuvs.so and libcuvs_c.so exist.

# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- conda-forge
- rapidsai-nightly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful because this could tie our release packaging to nightly dependencies. Will let the build team comment on the best way to do this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other RAPIDS packages, these environment files are used to create the test environment. They do not go into built packages. I think the same is true for you here, but you know better than me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to the best of my understanding, my reasoning behind setting this to rapidsai-nightly was to make sure that the latest changes happening in the cuvs level should be used, so that all required updates (especially the ones needed due to breaking changes in cuvs) at the cuvs-lucene level are attended to in time (if not taken care of, already).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mix. There's benefits and drawbacks to each method. With nightlies, you will catch new issues. You may spend more time fixing breaking changes in smaller chunks. You might update your software in ways that are incompatible with the previous release. This last issue is not a concern for us, because we do not support mixing different release versions of different packages.

I think your setting here is fine and safe. If you find the breaks come too often and you want to sort them out in larger batches, then remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree w/ @msarahan. I think this is fine.

Copy link

@msarahan msarahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. Main thing is to try to avoid LD_LIBRARY_PATH if you can. Just remember, the library search path matters more for what your library searches for (its dependencies) than it does for finding the .so's you're expecting to work with. You must ensure that libstdc++ from conda is always loaded first, before the system libstdc++ gets a chance to be otherwise found.

# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- conda-forge
- rapidsai-nightly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other RAPIDS packages, these environment files are used to create the test environment. They do not go into built packages. I think the same is true for you here, but you know better than me.


private static CuVSResources cuVSResourcesOrNull() {
try {
System.loadLibrary(
Copy link

@msarahan msarahan Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can pass arguments to loadLibrary in your Java code and not rely on LD_LIBRARY_PATH, things will probably be more robust. The hard part is that it must be recursive. Loading a conda library should load all conda dependencies, or else you get weird runtime errors. This is especially problematic with libstdc++.

One way to ensure that libstdc++ does not accidentally come from the system is to explicitly load it from the conda environment before loading any other (conda-based) libraries.

I don't know much at all about Java, but this site seemed to have some helpful alternatives to LD_LIBRARY_PATH. The more you can change just your process and not the environment, the fewer strange issues you'll have. LD_LIBRARY_PATH is a foot gun. Use it if you must, but only after exhausting other options.

Co-authored-by: Mike Sarahan <msarahan@gmail.com>
@cjnolet
Copy link
Member

cjnolet commented Sep 24, 2025

/ok to test f27f907

@cjnolet
Copy link
Member

cjnolet commented Sep 24, 2025

I'm going to go ahead and merge this since it fixes CI. If we need further updates to the way cuvs is installed, we can open follow-up PRs

@cjnolet
Copy link
Member

cjnolet commented Sep 24, 2025

/merge

@rapids-bot rapids-bot bot merged commit fee9c4f into rapidsai:branch-25.10 Sep 24, 2025
12 checks passed
rapids-bot bot pushed a commit that referenced this pull request Sep 26, 2025
Introducing a new Codec that uses CAGRA for building the index on GPU and serializing to Lucene-compatible HNSW index segments. The Lucene-compatible segments are searchable via the `Lucene99HnswVectorsReader` (which is the default in Lucene 10.x). 

Note: This is based on top of #14 and should be rebased once that is merged.

TODO:
- Benchmarks and more tests
- Further refactoring to split the `CuVSVectorsFormat` into GPU and CPU-specific formats. 

Fixes #13

Authors:
  - Vivek Narang (https://github.com/narangvivek10)
  - Puneet Ahuja (https://github.com/punAhuja)
  - Ishan Chattopadhyaya (https://github.com/chatman)

Approvers:
  - Ishan Chattopadhyaya (https://github.com/chatman)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

7 participants