Skip to content

[CuVS-Java] Automate panama bindings generation, Include IVF_PQ parameters in CAGRA index parameters and other changes #831

Merged
rapids-bot[bot] merged 30 commits into
NVIDIA:branch-25.06from
SearchScale:vivek/automate-panama-bindings
May 1, 2025
Merged

[CuVS-Java] Automate panama bindings generation, Include IVF_PQ parameters in CAGRA index parameters and other changes #831
rapids-bot[bot] merged 30 commits into
NVIDIA:branch-25.06from
SearchScale:vivek/automate-panama-bindings

Conversation

@narangvivek10

Copy link
Copy Markdown
Contributor

The PR includes code changes for the following:

  • Automation of Panama bindings generation using jextract.
  • Adding the ability to configure IVF_PQ index and search parameters via the cuvs-java API (to adapt with the following underlying changes).
  • Updating HNSW example to show the above.
  • Updating the readme files.
  • Simplifying logging in examples.
  • Bumping up the maven-javadoc-plugin version.
  • Updating and consolidating gitignore file.
  • Removing unused imports etc.

Please note that the existing Panama classes are being deleted because they were manually created and managed. With the new cleaner approach, this will not be needed anymore. Now these binding classes will be generated at build time and so no need to be in the codebase.

@copy-pr-bot

copy-pr-bot Bot commented Apr 21, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rhdong rhdong self-requested a review April 22, 2025 16:31
@rhdong rhdong added bug Something isn't working non-breaking Introduces a non-breaking change Java labels Apr 22, 2025
@narangvivek10 narangvivek10 marked this pull request as ready for review April 22, 2025 16:38
@narangvivek10 narangvivek10 requested a review from a team as a code owner April 22, 2025 16:38
@narangvivek10 narangvivek10 requested a review from jameslamb April 22, 2025 16:38

@rhdong rhdong left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhdong

rhdong commented Apr 23, 2025

Copy link
Copy Markdown
Contributor

/ok to test 806af86

1 similar comment
@cjnolet

cjnolet commented Apr 23, 2025

Copy link
Copy Markdown
Contributor

/ok to test 806af86

@chatman

chatman commented Apr 28, 2025

Copy link
Copy Markdown
Contributor

@rhdong I think this is ready to be merged. Please review and merge. Thanks!

Please note that there are a few things I've observed that's wrong with the javadocs and javadoc maven plugin, they can be tackled as a separate PR.

# Use Jextract utility to generate panama bindings
$JEXTRACT_COMMAND \
--include-dir ${REPODIR}/java/internal/_deps/dlpack-src/include/ \
--include-dir ${CUDA_HOME}/include \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narangvivek10 may need to change:
--include-dir ${CUDA_HOME}/targets/x86_64-linux/include \ <<---CUDA include path needs to be adjusted

# Use Jextract utility to generate panama bindings
$JEXTRACT_COMMAND \
--include-dir ${REPODIR}/java/internal/_deps/dlpack-src/include/ \
--include-dir ${CUDA_HOME}/include \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--include-dir ${CUDA_HOME}/include \
--include-dir ${CUDA_HOME}/targets/x86_64-linux/include \

@rhdong rhdong left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhdong

rhdong commented Apr 28, 2025

Copy link
Copy Markdown
Contributor

/ok to test 5bf8961

@cjnolet

cjnolet commented Apr 28, 2025

Copy link
Copy Markdown
Contributor

@rhdong is this running all the way through in your Java CI PR?

@rhdong

rhdong commented Apr 28, 2025

Copy link
Copy Markdown
Contributor

@rhdong is this running all the way through in your Java CI PR?

Yes, there's a slight difference in the CUDA_HOME include directory, which I mentioned in a previous comment.

@jameslamb jameslamb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this narrowly from a packaging perspective (I don't know much Java and or much about cuVS APIs).

Please see my suggestions on simplifying some of the shell scripts and making them stricter.

Also @rhdong you approved this, but there are open suggestions from you that haven't yet been addressed:

Those should be addressed before this is merged.

@@ -0,0 +1,63 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e -u -o pipefail

Can you please make this stricter, so that we'll be notified via loud failures for things like undefined variables or commands that fail?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also mean you can remove custom handling of return codes, like this stuff:

# Did Jextract complete normally? If not, stop and return
JEXTRACT_RETURN_VALUE=$?
if [ $JEXTRACT_RETURN_VALUE == 0 ]
then
  echo "Jextract SUCCESS"
else
  echo "Jextract encountered issues (returned value ${JEXTRACT_RETURN_VALUE})"
  exit $JEXTRACT_RETURN_VALUE
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jameslamb, thank you for pointing that out. I completely agree with you. Just to clarify, I originally believed I could make the same change in my PR following this (so from my perspective, it wasn’t a matter of principle).

Comment thread java/build.sh Outdated
&& cd ..

# Generate Panama FFM API bindings and update (if any of them changed)
/bin/bash panama-bindings/generate-bindings.sh

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/bin/bash panama-bindings/generate-bindings.sh
./panama-bindings/generate-bindings.sh

panama-binding/generate-bindings.sh already has /bin/bash in its shebang... let's please not duplicate that here.

If you haven't already, this change should be accompanied with making generate-bindings.sh executable.

chmod +x ./panama-bindings/generate-bindings.sh

Comment thread java/build.sh Outdated
Comment on lines +15 to +22
BINDINGS_GENERATION_RETURN_VALUE=$?
if [ $BINDINGS_GENERATION_RETURN_VALUE != 0 ]
then
echo "Bindings generation did not complete normally (returned value ${BINDINGS_GENERATION_RETURN_VALUE})"
echo "Forcing this build process to abort"
exit 1
fi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BINDINGS_GENERATION_RETURN_VALUE=$?
if [ $BINDINGS_GENERATION_RETURN_VALUE != 0 ]
then
echo "Bindings generation did not complete normally (returned value ${BINDINGS_GENERATION_RETURN_VALUE})"
echo "Forcing this build process to abort"
exit 1
fi

Instead of having this custom return-code handling, would you please consider:

  1. adding #!/bin/bash to the first line
  2. adding set -e -u -o pipefail somewhere after that, near the top, to ensure any command exiting with a non-0 exit code causes the entire script to exit

That's simplify the code here a bit.

Comment thread java/build.sh Outdated
cd internal && cmake . && cmake --build . \
&& cd .. \
&& mvn install:install-file -DgroupId=$GROUP_ID -DartifactId=cuvs-java-internal -Dversion=$VERSION -Dpackaging=so -Dfile=$SO_FILE_PATH/libcuvs_java.so \
&& cd ..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be re-written without needing to keep track of what the working directory is with these calls to cd, would you please consider that?

cmake -B ./internal/build -S ./internal
cmake --build ./internal/build

Comment thread java/build.sh Outdated

if [ -z "$CMAKE_PREFIX_PATH" ]; then
export CMAKE_PREFIX_PATH=`pwd`/../cpp/build
export CMAKE_PREFIX_PATH=`pwd`/../cpp/build

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CMAKE_PREFIX_PATH=`pwd`/../cpp/build
export CMAKE_PREFIX_PATH="$(pwd)/../cpp/build"

Can we please use the $() form instead? When shellcheck linting eventually starts to cover this file, it will complain about the use of backticks, for the reasons mentioned in https://www.shellcheck.net/wiki/SC2006

* @param mapping an instance of ID mapping
* @param topK the top k results to return
* @param prefilter the prefilter data to use while searching the BRUTEFORCE
* @param prefilters the prefilters data to use while searching the BRUTEFORCE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param prefilters the prefilters data to use while searching the BRUTEFORCE
* @param prefilters the prefilters data to use while searching the BRUTEFORCE

Let's keep this aligned with all the other lines above it, please.

Comment on lines +35 to +41
# Debug printing
echo "CUDA_HOME points to: $CUDA_HOME"
echo "include dir in CUDA_HOME has:"
ls $CUDA_HOME/include
echo "JEXTRACT_COMMAND points to: $JEXTRACT_COMMAND"
echo "CURDIR is: $CURDIR"
echo "REPODIR is: $REPODIR"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Debug printing
echo "CUDA_HOME points to: $CUDA_HOME"
echo "include dir in CUDA_HOME has:"
ls $CUDA_HOME/include
echo "JEXTRACT_COMMAND points to: $JEXTRACT_COMMAND"
echo "CURDIR is: $CURDIR"
echo "REPODIR is: $REPODIR"

I noticed this says "Debug printing"... is it just left over from debugging? If so, could it be removed to simplify this please?

wget -c $JEXTRACT_DOWNLOAD_URL
tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz
JEXTRACT_COMMAND="jextract-22/bin/jextract"
echo "jextract downloaded to `pwd`/jextract-22"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "jextract downloaded to `pwd`/jextract-22"
echo "jextract downloaded to $(pwd)/jextract-22"

Comment on lines +27 to +30
JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/openjdk-22-jextract+6-47_linux-x64_bin.tar.gz"
echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL.";
wget -c $JEXTRACT_DOWNLOAD_URL
tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/openjdk-22-jextract+6-47_linux-x64_bin.tar.gz"
echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL.";
wget -c $JEXTRACT_DOWNLOAD_URL
tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz
JEXTRACT_FILENAME="openjdk-22-jextract+6-47_linux-x64_bin.tar.gz"
JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/${JEXTRACT_FILENAME}"
echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL.";
wget -c $JEXTRACT_DOWNLOAD_URL
tar -xvf ./"${JEXTRACT_FILENAME}"

Let's please use a variable to reduce duplication here.

echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL.";
wget -c $JEXTRACT_DOWNLOAD_URL
tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz
JEXTRACT_COMMAND="jextract-22/bin/jextract"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JEXTRACT_COMMAND="jextract-22/bin/jextract"
export PATH="$(pwd)/jextract-22/bin/jextract:${PATH}"

Instead of tracking this JEXTRACT_COMMAND variable, I think it would be simpler to just place the just-downloaded jextract on PATH for the rest of this script. If you accept this suggestion, then also please remove other uses of JEXTRACT_COMMAND in this script, in favor of just calling jextract.

@chatman

chatman commented Apr 30, 2025

Copy link
Copy Markdown
Contributor

@jameslamb Thank you for your review. @narangvivek10 has incorporated the changes, and I've tested them to make sure they are working correctly.

@jameslamb

Copy link
Copy Markdown
Member

/ok to test f19f3d5

@jameslamb jameslamb self-requested a review April 30, 2025 22:14

@jameslamb jameslamb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all of my recommended changes have been addressed, thanks! And I know we're planning to test this on @rhdong 's other PR (#805), so approving this. Assuming CI passes, I am ok from a CI/packaging perspective with merging this.

Some notes for future PRs here:

  1. Add an email address tied to your GitHub profile into your ~/.gitconfig.

Notice that your commits do not have your GitHub profile picture next to them:

Screenshot 2025-04-30 at 5 16 03 PM

This lack of profile picture next to your commits means that GitHub was not able to tie your commits to your GitHub profile. That doesn't matter too much for things like contribution stats since we squash all commits on merge, but it does make it harder for reviewers like me to understand the commit history at a glance, for PRs like this with multiple authors.

You can fix for the future that by running the following locally in your development environment:

git config --global user.email "<email address tied to your GitHub account>"
  1. When a reviewer leaves comments for you and and you push commits that address those comments, please resolve the threads

Add a comment saying that you addressed the feedback (ideally with a link to a commit and any other relevant context about how you fixed it), then click "resolve conversation"

Screenshot 2025-04-30 at 5 19 46 PM

This makes the PRs easier for reviewers to navigate visually, and makes it clearer which feedback has not yet been addressed.

  1. Whenever you find yourself writing the RAPIDS version number ("25.06", for this PR), ensure that ci/release/update-version.sh would automatically update that for future versions (see my comment in this review).

Comment thread java/examples/README.md
### Bruteforce Example
In the current directory do:
```
mvn package && java --enable-native-access=ALL-UNNAMED -cp target/cuvs-java-examples-25.06.0.jar:$HOME/.m2/repository/com/nvidia/cuvs/cuvs-java/25.06.0/cuvs-java-25.06.0.jar com.nvidia.cuvs.examples.BruteForceExample

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every place where you mention the RAPIDS version explicitly, needs to be added to ci/release/update-version.sh.

That script is run over the repo whenever we create a new release branch.

So, for example, when we create branch-25.08 to start working on the next release, that script will be responsible for changing all the "25.06"s to "25.08".

Since we're trying to get this merged and test it on a separate PR (#805), don't treat this as blocking... @rhdong , please add this change to #805. And test that we've caught everything by running something like this:

./ci/release/update-version.sh '25.08.00'

# this should return 0 results
git grep -E '25\.4|25\.04|25\.6|25\.06'

@cjnolet

cjnolet commented Apr 30, 2025

Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit 3131a95 into NVIDIA:branch-25.06 May 1, 2025
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing May 1, 2025
rapids-bot Bot pushed a commit that referenced this pull request May 2, 2025
Contributes to rapidsai/build-planning#135

Follow-up to #662

While reviewing #805 and #831, I found myself suggesting things manually that I know `shellcheck` would have caught automatically. To prevent that for reviewers in the future, this proposes running `shellcheck` on **all** shell scripts in the repo, not just those in the `ci/` directory.

Other changes:

* updates `rapids-dependency-file-generator` to its latest version (1.18.1)
* consolidates duplicate entries for https://github.com/pre-commit/pre-commit-hooks in `.pre-commit-config.yaml`

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ben Frederickson (https://github.com/benfred)

URL: #865
rapids-bot Bot pushed a commit that referenced this pull request May 8, 2025
This PR adds changes for Java CI.

Some scripts modified here also appear in [PR #831](#831). Once 831 is merged, I’ll rebase and make sure everything stays consistent.

Authors:
  - rhdong (https://github.com/rhdong)
  - Vivek Narang (https://github.com/narangvivek10)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - James Lamb (https://github.com/jameslamb)
  - Ray Douglass (https://github.com/raydouglass)

URL: #805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Java non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants