Wrappers and a customized HNSW search#3926
Wrappers and a customized HNSW search#3926alexanderguzhva wants to merge 2 commits intofacebookresearch:mainfrom
Conversation
1d05380 to
f4b7439
Compare
f4b7439 to
59e2eca
Compare
|
Thanks, @alexanderguzhva for the PR! Could you split out the CMake re-factoring as a separate PR to review that separately? We would be able to move forward with that at a faster speed than the rest of the current PR. For IndexBruteForceWrapper I am curious what benefits you are seeing from that implementation compared to the current IndexFlat implementations, for example? I haven't had a chance to dig through the HNSW implementation, but will try to find time for that over the next week. |
|
wrappers are useful in a situation if you need to grab a index and make it work as it is was a brute-search one. For example, you have HNSWPQ index. |
|
@alexanderguzhva is there a reason not to replace the Faiss HNSW implementation altogether ? There seems to be a lot of duplication in the HNSW wrapper. |
|
|
@mdouze @asadoughi I was expecting you guys to maybe conduct some performance testing for your internal use cases. As of now, it is a candidate code, and it is placed as a separate one, because it allows to test both baseline and candidate HNSW search in a single application. |
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
59e2eca to
b42ef4a
Compare
|
How about benchmarks using the filter dataset from the NeurIPS '23 workshop? |
|
Hi @alexanderguzhva, do we still want to keep this issue open? Seems like it has been a while |
|
@satymish yes, please keep it open. I remember about it. |
|
(Assigning to Alexandr for issue tracking on our dashboard, not any indication of urgency) |
This PR introduces the following:
cmake/link_to_faiss_lib.cmake, which exposes a useful and reusable CMakelink_to_faiss_lib()functionIndexWrapperclass, which allows to override functionality of an existing Faiss index using a wrapper programming patternIndexHNSWWrapperoverridessearch()andrange_search()methods for IndexHNSW with a search way it is done and used in Milvus (https://github.com/zilliztech/knowhere). Compared to a baseline Faiss version, thissearch()implementation tries to maintain a stable recall rate with the change of the % of filtered samples. The Faiss baselinerange_search()implementation for HNSW raises questions...IndexBruteForceWrapperoverridessearch()andrange_search()methods to invoke a brute-force searchBitsetutility classCountSizeIOWriterutility class, which is used for quick-and-dirty way of evaluating the size of a data blob, produced byindex_write()As of the PR as it is, all code changes are located in
cppcontrib/knowheredirectory. I'm ready for discussion whether certain components need to be moved out ofcppcontrib/knowhereand moved to a baseline Faiss code.Also, I'm ready to write unit tests and refactoring, just let me know which ones are needed.
There is a quick-and-dirty benchmark included in the code, but it operates on a random data. And there is no easy way to grab a real bench data for C++ code (you need
contrib/dataset.pyfor BigANN, etc), so I'm ready to discuss python integrations, if needed.