[CuVS-Java] Automate panama bindings generation, Include IVF_PQ parameters in CAGRA index parameters and other changes #831
Conversation
…ava api, updated example, formatting and cleanup
|
/ok to test 806af86 |
1 similar comment
|
/ok to test 806af86 |
|
@rhdong I think this is ready to be merged. Please review and merge. Thanks! Please note that there are a few things I've observed that's wrong with the javadocs and javadoc maven plugin, they can be tackled as a separate PR. |
| # Use Jextract utility to generate panama bindings | ||
| $JEXTRACT_COMMAND \ | ||
| --include-dir ${REPODIR}/java/internal/_deps/dlpack-src/include/ \ | ||
| --include-dir ${CUDA_HOME}/include \ |
There was a problem hiding this comment.
@narangvivek10 may need to change:
--include-dir ${CUDA_HOME}/targets/x86_64-linux/include \ <<---CUDA include path needs to be adjusted
| # Use Jextract utility to generate panama bindings | ||
| $JEXTRACT_COMMAND \ | ||
| --include-dir ${REPODIR}/java/internal/_deps/dlpack-src/include/ \ | ||
| --include-dir ${CUDA_HOME}/include \ |
There was a problem hiding this comment.
| --include-dir ${CUDA_HOME}/include \ | |
| --include-dir ${CUDA_HOME}/targets/x86_64-linux/include \ |
|
/ok to test 5bf8961 |
|
@rhdong is this running all the way through in your Java CI PR? |
Yes, there's a slight difference in the CUDA_HOME include directory, which I mentioned in a previous comment. |
jameslamb
left a comment
There was a problem hiding this comment.
I reviewed this narrowly from a packaging perspective (I don't know much Java and or much about cuVS APIs).
Please see my suggestions on simplifying some of the shell scripts and making them stricter.
Also @rhdong you approved this, but there are open suggestions from you that haven't yet been addressed:
- https://github.com/rapidsai/cuvs/pull/831/files#r2064576675
- https://github.com/rapidsai/cuvs/pull/831/files#r2064596467
Those should be addressed before this is merged.
| @@ -0,0 +1,63 @@ | |||
| #!/bin/bash | |||
|
|
|||
There was a problem hiding this comment.
| set -e -u -o pipefail | |
Can you please make this stricter, so that we'll be notified via loud failures for things like undefined variables or commands that fail?
There was a problem hiding this comment.
This would also mean you can remove custom handling of return codes, like this stuff:
# Did Jextract complete normally? If not, stop and return
JEXTRACT_RETURN_VALUE=$?
if [ $JEXTRACT_RETURN_VALUE == 0 ]
then
echo "Jextract SUCCESS"
else
echo "Jextract encountered issues (returned value ${JEXTRACT_RETURN_VALUE})"
exit $JEXTRACT_RETURN_VALUE
fiThere was a problem hiding this comment.
Hi @jameslamb, thank you for pointing that out. I completely agree with you. Just to clarify, I originally believed I could make the same change in my PR following this (so from my perspective, it wasn’t a matter of principle).
| && cd .. | ||
|
|
||
| # Generate Panama FFM API bindings and update (if any of them changed) | ||
| /bin/bash panama-bindings/generate-bindings.sh |
There was a problem hiding this comment.
| /bin/bash panama-bindings/generate-bindings.sh | |
| ./panama-bindings/generate-bindings.sh |
panama-binding/generate-bindings.sh already has /bin/bash in its shebang... let's please not duplicate that here.
If you haven't already, this change should be accompanied with making generate-bindings.sh executable.
chmod +x ./panama-bindings/generate-bindings.sh| BINDINGS_GENERATION_RETURN_VALUE=$? | ||
| if [ $BINDINGS_GENERATION_RETURN_VALUE != 0 ] | ||
| then | ||
| echo "Bindings generation did not complete normally (returned value ${BINDINGS_GENERATION_RETURN_VALUE})" | ||
| echo "Forcing this build process to abort" | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
| BINDINGS_GENERATION_RETURN_VALUE=$? | |
| if [ $BINDINGS_GENERATION_RETURN_VALUE != 0 ] | |
| then | |
| echo "Bindings generation did not complete normally (returned value ${BINDINGS_GENERATION_RETURN_VALUE})" | |
| echo "Forcing this build process to abort" | |
| exit 1 | |
| fi |
Instead of having this custom return-code handling, would you please consider:
- adding
#!/bin/bashto the first line - adding
set -e -u -o pipefailsomewhere after that, near the top, to ensure any command exiting with a non-0 exit code causes the entire script to exit
That's simplify the code here a bit.
| cd internal && cmake . && cmake --build . \ | ||
| && cd .. \ | ||
| && mvn install:install-file -DgroupId=$GROUP_ID -DartifactId=cuvs-java-internal -Dversion=$VERSION -Dpackaging=so -Dfile=$SO_FILE_PATH/libcuvs_java.so \ | ||
| && cd .. |
There was a problem hiding this comment.
This could be re-written without needing to keep track of what the working directory is with these calls to cd, would you please consider that?
cmake -B ./internal/build -S ./internal
cmake --build ./internal/build|
|
||
| if [ -z "$CMAKE_PREFIX_PATH" ]; then | ||
| export CMAKE_PREFIX_PATH=`pwd`/../cpp/build | ||
| export CMAKE_PREFIX_PATH=`pwd`/../cpp/build |
There was a problem hiding this comment.
| export CMAKE_PREFIX_PATH=`pwd`/../cpp/build | |
| export CMAKE_PREFIX_PATH="$(pwd)/../cpp/build" |
Can we please use the $() form instead? When shellcheck linting eventually starts to cover this file, it will complain about the use of backticks, for the reasons mentioned in https://www.shellcheck.net/wiki/SC2006
| * @param mapping an instance of ID mapping | ||
| * @param topK the top k results to return | ||
| * @param prefilter the prefilter data to use while searching the BRUTEFORCE | ||
| * @param prefilters the prefilters data to use while searching the BRUTEFORCE |
There was a problem hiding this comment.
| * @param prefilters the prefilters data to use while searching the BRUTEFORCE | |
| * @param prefilters the prefilters data to use while searching the BRUTEFORCE |
Let's keep this aligned with all the other lines above it, please.
| # Debug printing | ||
| echo "CUDA_HOME points to: $CUDA_HOME" | ||
| echo "include dir in CUDA_HOME has:" | ||
| ls $CUDA_HOME/include | ||
| echo "JEXTRACT_COMMAND points to: $JEXTRACT_COMMAND" | ||
| echo "CURDIR is: $CURDIR" | ||
| echo "REPODIR is: $REPODIR" |
There was a problem hiding this comment.
| # Debug printing | |
| echo "CUDA_HOME points to: $CUDA_HOME" | |
| echo "include dir in CUDA_HOME has:" | |
| ls $CUDA_HOME/include | |
| echo "JEXTRACT_COMMAND points to: $JEXTRACT_COMMAND" | |
| echo "CURDIR is: $CURDIR" | |
| echo "REPODIR is: $REPODIR" |
I noticed this says "Debug printing"... is it just left over from debugging? If so, could it be removed to simplify this please?
| wget -c $JEXTRACT_DOWNLOAD_URL | ||
| tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz | ||
| JEXTRACT_COMMAND="jextract-22/bin/jextract" | ||
| echo "jextract downloaded to `pwd`/jextract-22" |
There was a problem hiding this comment.
| echo "jextract downloaded to `pwd`/jextract-22" | |
| echo "jextract downloaded to $(pwd)/jextract-22" |
| JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/openjdk-22-jextract+6-47_linux-x64_bin.tar.gz" | ||
| echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL."; | ||
| wget -c $JEXTRACT_DOWNLOAD_URL | ||
| tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz |
There was a problem hiding this comment.
| JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/openjdk-22-jextract+6-47_linux-x64_bin.tar.gz" | |
| echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL."; | |
| wget -c $JEXTRACT_DOWNLOAD_URL | |
| tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz | |
| JEXTRACT_FILENAME="openjdk-22-jextract+6-47_linux-x64_bin.tar.gz" | |
| JEXTRACT_DOWNLOAD_URL="https://download.java.net/java/early_access/jextract/22/6/${JEXTRACT_FILENAME}" | |
| echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL."; | |
| wget -c $JEXTRACT_DOWNLOAD_URL | |
| tar -xvf ./"${JEXTRACT_FILENAME}" |
Let's please use a variable to reduce duplication here.
| echo "jextract doesn't exist. Downloading it from $JEXTRACT_DOWNLOAD_URL."; | ||
| wget -c $JEXTRACT_DOWNLOAD_URL | ||
| tar -xvf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz | ||
| JEXTRACT_COMMAND="jextract-22/bin/jextract" |
There was a problem hiding this comment.
| JEXTRACT_COMMAND="jextract-22/bin/jextract" | |
| export PATH="$(pwd)/jextract-22/bin/jextract:${PATH}" |
Instead of tracking this JEXTRACT_COMMAND variable, I think it would be simpler to just place the just-downloaded jextract on PATH for the rest of this script. If you accept this suggestion, then also please remove other uses of JEXTRACT_COMMAND in this script, in favor of just calling jextract.
|
@jameslamb Thank you for your review. @narangvivek10 has incorporated the changes, and I've tested them to make sure they are working correctly. |
|
/ok to test f19f3d5 |
jameslamb
left a comment
There was a problem hiding this comment.
It looks like all of my recommended changes have been addressed, thanks! And I know we're planning to test this on @rhdong 's other PR (#805), so approving this. Assuming CI passes, I am ok from a CI/packaging perspective with merging this.
Some notes for future PRs here:
- Add an email address tied to your GitHub profile into your
~/.gitconfig.
Notice that your commits do not have your GitHub profile picture next to them:
This lack of profile picture next to your commits means that GitHub was not able to tie your commits to your GitHub profile. That doesn't matter too much for things like contribution stats since we squash all commits on merge, but it does make it harder for reviewers like me to understand the commit history at a glance, for PRs like this with multiple authors.
You can fix for the future that by running the following locally in your development environment:
git config --global user.email "<email address tied to your GitHub account>"- When a reviewer leaves comments for you and and you push commits that address those comments, please resolve the threads
Add a comment saying that you addressed the feedback (ideally with a link to a commit and any other relevant context about how you fixed it), then click "resolve conversation"
This makes the PRs easier for reviewers to navigate visually, and makes it clearer which feedback has not yet been addressed.
- Whenever you find yourself writing the RAPIDS version number ("25.06", for this PR), ensure that
ci/release/update-version.shwould automatically update that for future versions (see my comment in this review).
| ### Bruteforce Example | ||
| In the current directory do: | ||
| ``` | ||
| mvn package && java --enable-native-access=ALL-UNNAMED -cp target/cuvs-java-examples-25.06.0.jar:$HOME/.m2/repository/com/nvidia/cuvs/cuvs-java/25.06.0/cuvs-java-25.06.0.jar com.nvidia.cuvs.examples.BruteForceExample |
There was a problem hiding this comment.
Every place where you mention the RAPIDS version explicitly, needs to be added to ci/release/update-version.sh.
That script is run over the repo whenever we create a new release branch.
So, for example, when we create branch-25.08 to start working on the next release, that script will be responsible for changing all the "25.06"s to "25.08".
Since we're trying to get this merged and test it on a separate PR (#805), don't treat this as blocking... @rhdong , please add this change to #805. And test that we've caught everything by running something like this:
./ci/release/update-version.sh '25.08.00'
# this should return 0 results
git grep -E '25\.4|25\.04|25\.6|25\.06'|
/merge |
Contributes to rapidsai/build-planning#135 Follow-up to #662 While reviewing #805 and #831, I found myself suggesting things manually that I know `shellcheck` would have caught automatically. To prevent that for reviewers in the future, this proposes running `shellcheck` on **all** shell scripts in the repo, not just those in the `ci/` directory. Other changes: * updates `rapids-dependency-file-generator` to its latest version (1.18.1) * consolidates duplicate entries for https://github.com/pre-commit/pre-commit-hooks in `.pre-commit-config.yaml` Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Ben Frederickson (https://github.com/benfred) URL: #865
This PR adds changes for Java CI. Some scripts modified here also appear in [PR #831](#831). Once 831 is merged, I’ll rebase and make sure everything stays consistent. Authors: - rhdong (https://github.com/rhdong) - Vivek Narang (https://github.com/narangvivek10) - James Lamb (https://github.com/jameslamb) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - James Lamb (https://github.com/jameslamb) - Ray Douglass (https://github.com/raydouglass) URL: #805
The PR includes code changes for the following:
Please note that the existing Panama classes are being deleted because they were manually created and managed. With the new cleaner approach, this will not be needed anymore. Now these binding classes will be generated at build time and so no need to be in the codebase.