MB-65243: Merge facebookresearch/faiss@v1.11.0 into blevesearch/faiss@bleve#52
MB-65243: Merge facebookresearch/faiss@v1.11.0 into blevesearch/faiss@bleve#52abhinavdangeti merged 73 commits intoblevefrom
facebookresearch/faiss@v1.11.0 into blevesearch/faiss@bleve#52Conversation
…search#4145) Summary: This PR enables building with GCC for Windows using MinGW, using a completely standard cmake procedure, e.g., ``` cmake -B build -DCMAKE_INSTALL_PREFIX=$prefix -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TARGET_TOOLCHAIN} -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_C_API=ON ``` as detailed in the build recipe for the Julia ecosystem at https://github.com/JuliaPackaging/Yggdrasil/blob/334df79335424690bdbb9bdc432bc65affa92c8e/F/Faiss/common.jl#L8-L64 Pull Request resolved: facebookresearch#4145 Reviewed By: asadoughi Differential Revision: D68918508 Pulled By: mengdilin fbshipit-source-id: 1f663cf52bf2c59e874111af84b5df4ededbba91
Summary: Pull Request resolved: facebookresearch#4158 Reviewed By: mnorris11 Differential Revision: D68901764 Pulled By: asadoughi fbshipit-source-id: 5b5abf207947a62592ea48c32870e7060956a335
Summary: Update CAGRA train and add docs to reflect current capabilities. Pull Request resolved: facebookresearch#4152 Reviewed By: mnorris11 Differential Revision: D68901974 Pulled By: asadoughi fbshipit-source-id: 2667466286a48f7ed76017388f8bbd880025afdd
…bookresearch#4159) Summary: I believe the order of the parameters was not intended to be that way. This PR fixes it. Pull Request resolved: facebookresearch#4159 Reviewed By: mengdilin Differential Revision: D69073195 Pulled By: junjieqi fbshipit-source-id: bfb62aa7536deeda73bee32f2de3cd7a8142bb13
Summary: # The bug build gpu test with static lib failed undefined reference from gpu source code to cpu source, e.g. `GpuAutoTune.cpp:(.text+0x3bc): undefined reference to 'typeinfo for faiss::Index'` # build config: `cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=ON -DFAISS_ENABLE_GPU=ON` the link command would be like: `c++ -o faiss/gpu/test/TestCodePacking faiss/libfaiss.a -Wl,--push-state,--whole-archive faiss/gpu/libfaiss_gpu.a -Wl,--pop-state ...` key point is linking to faiss_gpu after faiss which is a classic link order issue of "undefined reference" # my fix: make faiss_gpu an `OBJECT` library. # Bouns TestXXX files get small because '--whole-archive' propagated to test link command. Pull Request resolved: facebookresearch#4137 Reviewed By: asadoughi Differential Revision: D68639371 Pulled By: gtwang01 fbshipit-source-id: 1d482b4311f8e52de0d709b03348eae72c64c544
Summary: Pull Request resolved: facebookresearch#4169 Point users to Github discussions page instead of Facebook group for release updates, Q&A, etc. Reviewed By: mnorris11 Differential Revision: D69140649 fbshipit-source-id: 56c7ed54411063789ed56ffc6aa23ff6cc696ef3
…4170) Summary: Changes: - Fix incorrect build target when `FAISS_OPT_LEVEL=avx512_spr`. Resolves facebookresearch#4160 Pull Request resolved: facebookresearch#4170 Reviewed By: asadoughi Differential Revision: D69165687 Pulled By: mnorris11 fbshipit-source-id: 696aee1d8110785abef974453287280cb24b86ca
…ncy (facebookresearch#4176) Summary: Pull Request resolved: facebookresearch#4176 For issue: facebookresearch#4175 Reproduce on CI by: 1. Update build-pull-request.yml and action.yml to only run `conda install -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-cuvs`, and also start a tmate session. 2. Log into the host 3. `cd` to ~/miniconda3/bin. Try to import faiss. It fails ``` runner@fv-az802-878:~/miniconda3/bin$ ./python Python 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:24:40) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import faiss Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/runner/miniconda3/lib/python3.12/site-packages/faiss/__init__.py", line 17, in <module> from .loader import * File "/home/runner/miniconda3/lib/python3.12/site-packages/faiss/loader.py", line 149, in <module> from .swigfaiss import * File "/home/runner/miniconda3/lib/python3.12/site-packages/faiss/swigfaiss.py", line 13, in <module> from . import _swigfaiss ImportError: libnvJitLink.so.12: cannot open shared object file: No such file or directory >>> ``` Reviewed By: asadoughi Differential Revision: D69278685 fbshipit-source-id: 44c476c9ab3de5a43edb6da45cbb32a80df90160
Summary: Add ability to search HNSW indexes using a plain [`SearchParameters`](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/Index.h#L64-L69) object (i.e. only an [`IDSelector`](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/Index.h#L66)) Issue: Currently if a plain `SearchParameters` is used to query an HNSW index, [an error is thrown](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/faiss/IndexHNSW.cpp#L251) -- when the user's intent was only to filter some documents, and rely on index settings for remaining parameters (like `efSearch`, `check_relative_distance`, `search_bounded_queue`) Motivation: Faiss provides an amazing [index factory](https://github.com/facebookresearch/faiss/wiki/The-index-factory) and [parameter setter](https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning) to abstract away internals of the index type and settings used, like: ```cpp Index* index = index_factory(256, "HNSW32"); ParameterSpace().set_index_parameters(index, "efConstruction=200,efSearch=150"); ``` Now if a user wants to perform a filtered search on this _opaque_ index using: ```cpp SearchParameters parameters; parameters.sel = new IDSelectorRange(10, 20); index->search(nq, xq, k, d, id, ¶meters); ``` they are met with an error: ``` faiss/IndexHNSW.cpp:251: Error: '!(params)' failed: params type invalid ``` An easy way to reproduce this issue is to replace `Flat` -> `HNSW` [here](https://github.com/facebookresearch/faiss/blob/6c046992a71e672df504a0101cddf6f2f2e90601/c_api/example_c.c#L60) and run `example_c` like: ``` make -C build example_c ./build/c_api/example_c ``` This PR allows passing a plain `SearchParameters` to HNSW indexes, and use index settings as a fallback Pull Request resolved: facebookresearch#4167 Reviewed By: asadoughi Differential Revision: D69312175 Pulled By: mnorris11 fbshipit-source-id: 63cc1deb6cb6116850cb3f8f7866eaa3a911ee48
) Summary: Pull Request resolved: facebookresearch#4150 Creates a sharding convenience function for IVF indexes. - The __**centroids on the quantizer**__ are sharded based on the given sharding function. (not the data, as data sharding by ids is already implemented by copy_subuset_to, https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.h#L408) - The output is written to files based on the template filename generator param. - The default sharding function is simply the ith vector mod the total shard count. This would called by Laser here: https://www.internalfb.com/code/fbsource/[ce1f2e028e79]/fbcode/fblearner/flow/projects/laser/laser_sim_search/knn_trainer.py?lines=295-296. This convenience function will do the file writing, and return the created file names. There's a few key required changes in FAISS: 1. Allow `std::vector<std::string>` to be used. Updates swigfaiss.swig and array_conversions.py to accommodate. These have to be numpy dtype of `object` instead of the more correct `unicode`, because unicode dtype is fixed length. I couldn't figure out how to create a numpy array with each of the output file names where they have different dtypes. (Say the file names are like file1, file11, file111. The dtype would need to be U5, U6, U7 respectively, as the dtype for unicode contains the length). I tried structured arrays : this does not work either, as numpy makes it into a matrix instead: the `file1 file11 file111` example with explicit setting of U5, U6, U7 turns into `[[file1 file1 file1], [file1 file11 file11], [file1 file11 file111]]`, which we do not want. If someone knows the right syntax, please yell at me 2. Create Python callbacks for sharding and template filename: `PyCallbackFilenameTemplateGenerator` and `PyCallbackShardingFunction`. Users of this function would inherit from the FilenameTemplateGenerator or ShardingFunction in C++ to pass to `shard_ivf_index_centroids`. See the other examples in python_callbacks.cpp. This is required because Python functions cannot be passed through SWIG to C++ (i.e. no std::function or function pointers), so we have to use this approach. This approach allows it to be called from both C++ and Python. test_sharding.py shows the Python calling, test_utils.cpp shows the C++ calling. Reviewed By: asadoughi Differential Revision: D68534991 fbshipit-source-id: b857e20c6cc4249a2ab7792db4c93dd4fb8403fd
Summary: Check for not completed rather than just in_progress, as runs can be queued, waiting, etc. Fix due to failed nightly not retrying because retry build found it was "queued" instead of "in_progress" Failed nightly: https://github.com/facebookresearch/faiss/actions/runs/13301645334/attempts/1 Retry that didn't trigger: https://github.com/facebookresearch/faiss/actions/runs/13301647044/job/37144032841 Reviewed By: mengdilin Differential Revision: D69610422 fbshipit-source-id: a7a9b998bba160e8d1ba13c7ae2426d99125a7e8
Summary: Pull Request resolved: facebookresearch#4185 Based on this users comment it seems like we should do bound checking: facebookresearch#4177 Reviewed By: mnorris11 Differential Revision: D69497295 fbshipit-source-id: 97025cf29c464afb0f85aa98f4b303489b7fc989
) Summary: Pull Request resolved: facebookresearch#4198 1. pins lief due to `AttributeError: type object 'CLASS' has no attribute 'CLASS64'` (just set it to last passing nightly version) 2. pins mkl in gpu builds due to it trying to pull in 2024.2.2 which conflicts with 2023 in the libfaiss. Added nightlies to make sure they pass https://github.com/facebookresearch/faiss/actions/runs/13422430425/job/37498020894. Not all passed: I'm not sure the `build-pull-request / Linux x86_64 GPU w/ cuVS nightlies (CUDA 12.4.0)` nightly is actually broken, but this unblocks the PR builds for now. Reviewed By: junjieqi Differential Revision: D69860604 fbshipit-source-id: 2da623c71b03c22d581b78655253a863fbafd3ed
) Summary: This is required to enable lazy setting of a device copy of the training dataset to a cuVS CAGRA index. Pull Request resolved: facebookresearch#4173 Reviewed By: mnorris11 Differential Revision: D69795662 Pulled By: gtwang01 fbshipit-source-id: 68cda198ed7983800b64d3e5fac1b77ff55ecd12
Summary: Pull Request resolved: facebookresearch#4205 Removing unused variable. This piece of code began to be compiled after armv9a has been set as default compilation profile Reviewed By: andrewjcg Differential Revision: D69946389 fbshipit-source-id: f2b5e57585506eb7cecbf76bf71bc6a2b5cc7133
Summary: ## Description Add the support for adding vectors with ids when IndexIDMap is used with Cagra Index. Resolves issue: facebookresearch#4107 Pull Request resolved: facebookresearch#4188 Reviewed By: mnorris11 Differential Revision: D69812544 Pulled By: gtwang01 fbshipit-source-id: 3c12c930e5d10ce214b12e68dacd63a644011b79
Summary: Pull Request resolved: facebookresearch#4204 Fix for S492386 I found a slight difference between failing nightly: https://github.com/facebookresearch/faiss/actions/runs/13429138293/job/37523589618 And last succeeding nightly: https://github.com/facebookresearch/faiss/actions/runs/13301645334/job/37182266030 The mkl package in the last succeeding nightly is 2023.0.0, and it is 2023.2.0 in the failing nightly. Since mkl was recently causing trouble, I pin mkl to 2023.0.0 in this diff to match the las succeeding nightly. Reviewed By: mnorris11 Differential Revision: D69937976 fbshipit-source-id: 0c4aba4322e26aa6a03bf3ea1dbee6ed7049092c
Summary: Pull Request resolved: facebookresearch#4203 Related to issue: facebookresearch#4202 Reviewed By: mengdilin Differential Revision: D69933126 fbshipit-source-id: cafc5f34d0f91450c5067827756b1297684b0ce3
…h#4209) Summary: If both `avx512` and `avx512_spr` are compiled, Sapphire Rapids capabilities are never loaded when using the Python bindings, as the `avx512` import always overrides the `avx512_spr` one. This very small PR solves the issue. Pull Request resolved: facebookresearch#4209 Reviewed By: mengdilin Differential Revision: D70015045 Pulled By: gtwang01 fbshipit-source-id: d3553a6c9048a534c0901ee29e7e2354de96e79f
Summary: Pull Request resolved: facebookresearch#4211 Reviewed By: gtwang01 Differential Revision: D70102429 fbshipit-source-id: 68e265699448a825b82467064ca95742bd4e49c3
…earch#4197) Summary: Pull Request resolved: facebookresearch#4197 Ivan and I discussed 2 problems: 1. We may want to try to offload/shard PQ or SQ table data if there is a big enough win (pending) 2. IDs seem to be random after sharding. This diff solves 2. Root cause is that we add to quantizer without IDs. Instead, we wrap in IndexIDMap2 (which provides reconstruction, whereas IndexIDMap does not). Laser's quantizers are Flat and HNSW, so we can wrap like this. Reviewed By: ivansopin Differential Revision: D69832788 fbshipit-source-id: 331b6d1cf52666f5dac61e2b52302d46b0a83708
Summary: Pull Request resolved: facebookresearch#4214 Got build failure with flags `[-Werror,-Wunneeded-internal-declaration]` ``` faiss/impl/code_distance/code_distance-sve.h:199:13 error: 'static' function 'distance_four_codes_sve_for_small_m' declared in header file should be declared 'static inline' [-Werror,-Wunneeded-internal-declaration] ``` Reviewed By: vit-ka Differential Revision: D70279069 fbshipit-source-id: 28b5cc8394a9a508e25f72777f74de685d242dc4
Summary: Pull Request resolved: facebookresearch#4217 liblief seemed to be causing issues in nightly: https://github.com/facebookresearch/faiss/actions/runs/13560151536/job/37908376889 Removing the pin while pinning conda-build resolves the issue. Reviewed By: mnorris11 Differential Revision: D70344910 fbshipit-source-id: c19bfcf187714fbe36e549bfb007eb9787a011b6
…ch#4151) Summary: Pull Request resolved: facebookresearch#4151 Reviewed By: junjieqi, asadoughi Differential Revision: D68784260 fbshipit-source-id: a715b02fd18a59c393be3ccc9aa1a7be8b196cc8
Summary: Pull Request resolved: facebookresearch#4219 `code_distance-sve.h` references `PQDecoder8` but doesn't include the header. The issue is revealed by D68784260 which removed some includes from a header that indirectly included `ProductQuantizer.h` ``` headers/faiss/impl/code_distance/code_distance-sve.h:74:45: error: unknown type name 'PQDecoder8'; did you mean 'PQDecoderT'? 74 | std::enable_if_t<std::is_same_v<PQDecoderT, PQDecoder8>, float> inline distance_single_code_sve( | ^~~~~~~~~~ | PQDecoderT ``` Reviewed By: ddrcoder Differential Revision: D70433576 fbshipit-source-id: 12945b16003a3d6a995b18ffe9798179ecf826f4
…#4220) Summary: Pull Request resolved: facebookresearch#4220 LLVM-19 is incoming. This fixes an issue preventing it. Delays to previous platform upgrades cost $3M/week. Reviewed By: dtolnay Differential Revision: D70449926 fbshipit-source-id: 20e0882b9363670d6c010e1c7870cb04155a3a9d
Summary: Pull Request resolved: facebookresearch#4227 same as title Differential Revision: D70724590 fbshipit-source-id: 943648d9002b38ba967c254c8c7014fdc7ab3de8
…search#4229) Summary: Pull Request resolved: facebookresearch#4229 same as title Differential Revision: D70728870 fbshipit-source-id: aeb817d80b20e5671c81ba88cdd05797cb070d23
Summary: Pull Request resolved: facebookresearch#4195 Non-templated `hammings` call produced incorrect values. `hammings` is called from `hamming_distance_table`, which in turn is unused so no impact. https://www.internalfb.com/code/fbsource/[85684614381d9bdfaaa0bb4a42e244296e350848]/fbcode/faiss/IndexPQ.cpp?lines=439-446 Reviewed By: gtwang01 Differential Revision: D69613329 fbshipit-source-id: 5d02a99b04492a61ebf0134f0c1719eac86fbb4f
…#4232) Summary: Pull Request resolved: facebookresearch#4232 `nullptr` is preferable to `0` or `NULL`. Let's use it everywhere so we can enable `-Wzero-as-null-pointer-constant`. - If you approve of this diff, please use the "Accept & Ship" button :-) Reviewed By: dtolnay Differential Revision: D70818157 fbshipit-source-id: a46d64b6d80844f5246f7df236eb6ec54ce2886f
… in python (facebookresearch#4304) Summary: Pull Request resolved: facebookresearch#4304 can't use properly in notebooks without this: `index.code_size` will fail Reviewed By: junjieqi Differential Revision: D73482034 fbshipit-source-id: e523e4d44ea8fa7255c81bc2c23ad9e7ec3eec96
Summary: facebookresearch#3374 has a typo, my local repo has the correct fixing (find it when sync from recent v1.10), the intended fixing is: for example, if we have `n = 10`, and `nshards = 8` - the original buggy code ```c++ size_t shard_size = (n + nshards - 1) / nshards; // i.e. shard_size = (10 + 8 - 1) / 8 = 2 size_t i0 = idx * shard_size; // all i0: [0, 2, 4, 6, 8, 10, 12, 14], wrong!!! 14>n size_t ni = std::min(shard_size, n - i0); // all ni: [2, 2, 2, 2, 2, 0, -2, -4], wrong!!! ``` - should be: ```c++ size_t base_shard_size = n / nshards; // i.e. shard_size = 10 / 8 = 1 size_t i0 = idx * base_shard_size + std::min(size_t(idx), n % nshards); // all i0: [0, 2, 4, 5, 6, 7, 8, 9] size_t ni = base_shard_size; if (idx < n % nshards) { ++ni; } // all ni: [2, 2, 1, 1, 1, 1, 1, 1] ``` Pull Request resolved: facebookresearch#4299 Reviewed By: junjieqi Differential Revision: D73199869 Pulled By: mnorris11 fbshipit-source-id: 81cbb8284818212781978412262e91d7747f9b97
Summary: Pull Request resolved: facebookresearch#4308 Increment to next release of Faiss, v1.11.0 Differential Revision: D73595484 fbshipit-source-id: bc065fa84ccdfbab8e6518e71d1ce8d7d8bd5bde
v1.11.0 into fork's bleve branchfacebookresearch/faiss@v1.11.0 into blevesearch/faiss@bleve
CMake Error: install(EXPORT "faiss-targets" ...) includes target "faiss_c" more than once in the export set.
|
@deepkaran @ceejatec should we consider this to bump up the faiss version we use in |
facebookresearch/faiss@v1.11.0 into blevesearch/faiss@blevefacebookresearch/faiss@v1.11.0 into blevesearch/faiss@bleve
|
Should we merge this first to the |
|
@ceejatec Good idea. I've just overwritten the |
|
@abhinavdangeti Please don't do that without also updating the |
|
Hey @abhinavdangeti , quick question - any particular features/fixes that come with this upgrade or is it a general upgrade to latest version? |
|
@deepkaran it brings in some utilities for binary vector indexes, which we want to experiment with. |
|
@deepkaran For |
|
@abhinavdangeti @metonymic-smokey thanks for the information. Sounds good. |
|
@deepkaran @ceejatec I've run some limited testing with the toybuild and things seem ok - so if this commit looks ok to you, we can merge away. |
|
I've proposed the corresponding manifest version update: https://review.couchbase.org/c/manifest/+/228059 Please merge both changes at the same time; thanks. |
|
Looks good from my side. |
Merge results: