This repository contains one version of the source code for our NSDI'23 paper "Fast, Approximate Vector Queries on Very Large Unstructured Datasets" [Paper].
- Auncel/
- The source code of Auncel implementation and design (fork from Faiss 1.15.2)
- LAET/
- The source code of sigmod20 paper, "Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination" (fork from LAET and add new datasets)
- faiss/
- The source code of vector search engine, Faiss (fork from Faiss 1.15.2 and change its ELP (
Autotune.cpp
) from average case to bounded case)
- The source code of vector search engine, Faiss (fork from Faiss 1.15.2 and change its ELP (
- Hardware
- AWS c5.4xlarge & c5.metal
- AWS c5.4xlarge & c5.metal
- Software
- Intel MKL & clang & OpenMP
- Intel MKL & clang & OpenMP
- Datasets
- The 10M-dataset is a random 10M slice of the whole 1B-dataset (SIFT DEEP TEXT GIST). You can download the preprocessed(e.g., normalized for text) datasets here data-link-1 or data-link-2(7w3r) (I recommend you to use the provided datasets if you want to use our configuration)
- Compile
- Run the following commands:
cd ./Auncel && ./configure --without-cuda && ./build.sh && cd ../
to compile the code of Auncel - Run the following commands:
cd ./LAET && ./configure --without-cuda && ./build.sh && cd ../
to compile the code of LAET - Run the following commands:
cd ./faiss && ./configure --without-cuda && ./build.sh && cd ../
to compile the code of Faiss
- Run the following commands:
- Run
-
Overall : Before running the python programs to generate the figures, you are supposed to run the corresponding program to get result log files. Run
cd ./Auncel/eval/ && ./run.sh && cd -
to get log files of Auncel. Runcd ./LAET/benchs/learned_termination/ && ./run.sh && cd -
to get log files of LAET. Runcd ./faiss/eval/run.sh && && ./run.sh && cd -
to get log files of Faiss. Runcd ./figures/overall/ && ./overall.sh && cd -
to get the three figures. -
Effectiveness : Before running the python programs to generate the figures, you are supposed to run the corresponding program to get result log files. Run
cd ./Auncel/eval/ && ./effect.sh && cd -
to get log files of Auncel. Runcd ./figures/effect/ && ./effect.sh && cd -
to get the two figures. -
Validation : The log files are automatically generated when you run
cd ./Auncel/eval/ && ./run.sh && cd -
. (Please set<repo>/Auncel/IVF_pro.h/struct Trace -> bs
as 1 to capture every point in the$\varphi - U$ map. ) To draw the figures, please runcd ./figures/validation && ./validation.sh && cd -
. -
Overhead : Run
cd ./Auncel/eval/ && ./overhead.sh && cd -
and you will get the corresponding experimental data on the terminal. -
Dist : Please refer
<repo>/Auncel/dist/README.md
for the details of distributed experiment. The figure script is<repo>/figures/dist/figure16.py
-
Overall : Before running the python programs to generate the figures, you are supposed to run the corresponding program to get result log files. Run
For any question, please contact zzlcs at pku dot edu dot cn
.