Skip to content

Refactor Faiss-based vector format for easier backport#14934

Merged
mikemccand merged 4 commits intoapache:mainfrom
kaivalnp:refactor-faiss
Aug 12, 2025
Merged

Refactor Faiss-based vector format for easier backport#14934
mikemccand merged 4 commits intoapache:mainfrom
kaivalnp:refactor-faiss

Conversation

@kaivalnp
Copy link
Contributor

Description

Refactor classes of the Faiss-based vector format to simplify backport to 10.x

  • Extract minimal functionality required for the format into a new FaissLibrary interface
  • Separate all function calls to the native library into a new FaissNativeWrapper (and use invokeExact for faster calls!)
  • Create a newFaissLibraryNativeImpl class implementing FaissLibrary with FaissNativeWrapper under the hood
  • Dynamically load FaissLibraryNativeImpl from FaissLibrary at runtime
  • This setup encapsulates "unsafeness" into FaissNativeWrapper and FaissLibraryNativeImpl (as marked by TODOs in build: enable more of javac 24's lints, fix some issues #14907), which can independently be moved into the java21/ directory for an easier backport (see comments in Backport Faiss-based vector format to 10.x #14843), while the FaissKnnVectors{Format/Reader/Writer} can stay as-is!
  • Improve some error handling

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@kaivalnp
Copy link
Contributor Author

Also ran benchmarks to ensure these changes don't adversely affect performance..

main:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.934        1.649   1.619        0.982  100000   100      50       64        250         no      8.51      11757.79           10.10             1          637.45       292.969      292.969       HNSW

This PR:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.934        1.637   1.608        0.982  100000   100      50       64        250         no      8.10      12347.20            9.72             1          637.45       292.969      292.969       HNSW

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kaivalnp -- this looks clean to me -- but @uschindler probably has suggestions still ;)

interface FaissLibrary {
FaissLibrary INSTANCE = lookup();

// TODO: Use vectorized version where available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what does this TODO mean again? Is "vectorized" meaning "SIMD instructions"? This term (vector) is overloaded! https://youtu.be/fVq4_HhBK8Y

Maybe clarify to // TODO: use SIMD Faiss API versions where available or so (if that's what it really means)? I think this is necessary because the Faiss build process will produce specific dynamic library for specific SIMD targets (AVX-512 vs AVX-128 etc.)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry, I meant using SIMD instructions wherever available..

Today, the shared library of Faiss' C API (libfaiss_c.so) is linked to the non-SIMD version of the base library (libfaiss.so) by default, but a user can still "point" to the correct SIMD version by changing its dependencies using:

# patchelf --replace-needed OLD_DEPENDENCY NEW_DEPENDENCY SHARED_LIBRARY
patchelf --replace-needed libfaiss.so libfaiss_{avx2,avx512,sve}.so libfaiss_c.so

However, we'd ideally want to do this automatically (either propose a change to upstream Faiss, or something else from Lucene) -- but I wasn't sure how to do it right now..

I'll update the comment soon!

@uschindler
Copy link
Contributor

uschindler commented Jul 22, 2025

Give me some time to check this.

In general I agree with those refactorings, it is much cleaner and exception handling is correct. Also everything which touches native code is encapsulated. This is all strongly needed and should be done without any backporting in mind. It is rwquired to have good code.

I don't think we should backport this to our stable 10.x branch as this won't work with Java 21 and still theres complex work to be used to make this compile with Java 21 using APIJARs duplication and so on. My proposal is to release Lucene 11 this autumn with Java 25 (see mailing list thread) where we have no limitations with preview APIs anymore.

@kaivalnp
Copy link
Contributor Author

Give me some time to check this.

Hi @uschindler, wanted to ask if you have any feedback for these changes?

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the whole thing looks fine. I have not checked the internals and only looked at the native code. It looks fine to have everything which calls native code into two classes.

The try/catch code looks correct. The redundancy here is needed, thanks for adding it, using a wrapper method for catching exceptions as before was not a good idea, because it required to use slow/inexact method handles and varargs.

But please remove the lookup of the native implementation in FaiddLibrary.java. In Lucene's main branch this is useless. Just return the instance directly (see code of MMapDir in main branch, there are no such indirections anymore).

Like Robert, I don't agree with backporting this to 10.x branch. This should be a new feature for Lucene 11.

String NAME = "faiss_c";
String VERSION = "1.11.0";

private static FaissLibrary lookup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have that dynamic code in main branch? Please replace by a simple return new FaissLibraryNativeImpl().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, updated

- Avoid dynamic lookup for FaissLibraryNativeImpl on the main branch
- Explain the SIMD situation in more detail
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Aug 4, 2025

Thanks a lot @uschindler!

I don't agree with backporting this to 10.x branch

Personally, I feel that the main blocker here was some differences in function names that allocated / read strings and arrays from native memory (which I think could be worked around using method handles to point to the correct function) -- but I agree, it could become messy due to the incompatibility.

I'll hold off on trying to backport this change unless someone feels otherwise.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaivalnp -- this looks cleaner!

It sounds like consensus is to not backport this to 10.x, and rather release 11.0 soon (end of year ish?).

@mikemccand
Copy link
Member

I think this is ready to merge? I'll merge in a day or so ...

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes look OK.

@mikemccand mikemccand merged commit fa6c87c into apache:main Aug 12, 2025
8 checks passed
@mikemccand
Copy link
Member

Thank you @kaivalnp!

@kaivalnp kaivalnp deleted the refactor-faiss branch August 12, 2025 15:59
akhilesh-k pushed a commit to akhilesh-k/lucene that referenced this pull request Aug 24, 2025
* Refactor Faiss-based vector format for easier backport

* Fix errorprone

* Address comments

- Avoid dynamic lookup for FaissLibraryNativeImpl on the main branch
- Explain the SIMD situation in more detail

---------

Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants