Skip to content

Commit 6ebc640

Browse files
Add HNSWPQ4Bits example (#176)
Co-authored-by: Patrick <[email protected]>
1 parent 4e7ff32 commit 6ebc640

File tree

4 files changed

+222
-0
lines changed

4 files changed

+222
-0
lines changed

examples/ann-hnsw-pq4bits/Makefile

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
CXX=g++
2+
CXXFLAGS=-fopenmp -O3 -std=c++14 -fPIC -DNDEBUG -Wall -g -lblas
3+
EXTRA_INCLUDE_FLAGS=-I../../pecos/core/
4+
ARCHFLAG=-march=native
5+
6+
all: go
7+
8+
go: example.cpp
9+
${CXX} -o go ${CXXFLAGS} example.cpp -I. ${EXTRA_INCLUDE_FLAGS} ${ARCHFLAG}
10+
clean:
11+
rm -rf *.so *.o go

examples/ann-hnsw-pq4bits/README.md

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
2+
3+
## Notice
4+
- Currently we only support L2 distance with 4 Bits Product Quantization.
5+
- We are working on extending to angular and ip distance measures.
6+
7+
## Install Prerequisite
8+
9+
To run this project, prerequisite is the same as building PECOS.
10+
11+
* For Ubuntu (18.04, 20.04):
12+
``` bash
13+
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
14+
```
15+
* For Amazon Linux 2 Image:
16+
``` bash
17+
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y groupinstall 'Development Tools'
18+
```
19+
One needs to install at least one BLAS library to compile PECOS, e.g. `OpenBLAS`:
20+
* For Ubuntu (18.04, 20.04):
21+
``` bash
22+
sudo apt-get install -y libopenblas-dev
23+
```
24+
* For Amazon Linux 2 Image and AMI:
25+
``` bash
26+
sudo amazon-linux-extras install epel -y
27+
sudo yum install openblas-devel -y
28+
```
29+
30+
## Prepare Data
31+
32+
Get the exemplar sift-128-eucldiean dataset
33+
34+
```bash
35+
wget https://archive.org/download/pecos-dataset/ann-benchmarks/sift-euclidean-128.tar.gz
36+
```
37+
38+
Extract the dataset
39+
40+
```bash
41+
tar -xf sift-euclidean-128.tar.gz
42+
```
43+
44+
The prepared dataset consists of 3 .npy files : X.trn.npy (training data), X.tst.npy (testing data) and Y.tst.npy (10 Nearest neighbors in training data of test data).
45+
46+
## Compile the source code
47+
48+
```bash
49+
Make clean go
50+
```
51+
52+
a runnable named "go" will be generated.
53+
54+
## Running the compiled runnable
55+
56+
the runnable take arguments in the following form :
57+
```bash
58+
./go data_folder model_folder space M efC #threads efs num_rerank sub_dimension
59+
```
60+
61+
data_folder is the place where 3 npy files stored. model_folder is the place to store the trained model. If a saved model is found, we will load the model instead of training a new one. space denotes the distance measure to use. Currently, we only support L2. M is the maximal edge connection used in HNSW. efC is the Maximal connecting edges during construction used in HNSW. #threads is the number of threads to build the graph. Up to now, these hypaer-parameters relate to the construction, and they will be used to name the trained model directory. efs is the search queue size in the inference step. num_rerank is the number of points in the queue that we will further rerank again using original features instead of quantized distance. sub_dimension is the dimension of each subspace in Product Quantization. If sub_dimension is set to 0, it will use default scheme. That is, if original data dimension <= 400, we use sub_dimension == 1, otherwise we use sub_dimension == 2.
62+
63+
Here, we provide an example of command executing the runnable :
64+
65+
```bash
66+
./go sift-euclidean-128 sift-euclidean-128 l2 8 500 24 10 10 0
67+
```
68+
69+
## Experiment
70+
71+
The compiled source code in example.cpp already repeats the inference 10 times. So to evaluate under ann-benchmark protocol, we could simply use python to iterate hyper-parameters and record done results.
72+
73+
```bash
74+
python run.py
75+
```

examples/ann-hnsw-pq4bits/example.cpp

+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
#include <iostream>
2+
#include <string>
3+
#include <unordered_set>
4+
#include "utils/matrix.hpp"
5+
#include "utils/scipy_loader.hpp"
6+
#include "ann/hnsw.hpp"
7+
8+
9+
10+
class StopW {
11+
std::chrono::steady_clock::time_point time_begin;
12+
public:
13+
StopW() {
14+
time_begin = std::chrono::steady_clock::now();
15+
}
16+
17+
float getElapsedTimeMicro() {
18+
std::chrono::steady_clock::time_point time_end = std::chrono::steady_clock::now();
19+
return (std::chrono::duration_cast<std::chrono::microseconds>(time_end - time_begin).count());
20+
}
21+
22+
void reset() {
23+
time_begin = std::chrono::steady_clock::now();
24+
}
25+
};
26+
27+
28+
int num_rerank;
29+
int sub_dimension;
30+
using pecos::ann::index_type;
31+
32+
typedef float32_t value_type;
33+
typedef uint64_t mem_index_type;
34+
typedef pecos::NpyArray<value_type> scipy_npy_t;
35+
36+
37+
auto npy_to_drm = [](scipy_npy_t& X_npy) -> pecos::drm_t {
38+
pecos::drm_t X;
39+
X.rows = X_npy.shape[0];
40+
X.cols = X_npy.shape[1];
41+
X.val = X_npy.array.data();
42+
return X;
43+
};
44+
45+
46+
template<typename MAT, typename feat_vec_t>
47+
void run_dense(std::string data_dir , char* model_path, index_type M, index_type efC, index_type max_level, int threads, int efs) {
48+
// data prepare
49+
scipy_npy_t X_trn_npy(data_dir + "/X.trn.npy");
50+
scipy_npy_t X_tst_npy(data_dir + "/X.tst.npy");
51+
scipy_npy_t Y_tst_npy(data_dir + "/Y.tst.npy");
52+
auto X_trn = npy_to_drm(X_trn_npy);
53+
auto X_tst = npy_to_drm(X_tst_npy);
54+
auto Y_tst = npy_to_drm(Y_tst_npy);
55+
// model prepare
56+
index_type topk = 10;
57+
pecos::ann::HNSWProductQuantizer4Bits<float, feat_vec_t> indexer;
58+
FILE* fp = fopen(model_path, "rb");
59+
if (!fp) {
60+
// if subspace_dimension is set to 0, it will use default scheme. That is,
61+
// if dimension <= 400, we use subspace_dimension 1, otherwise we use 2.
62+
indexer.train(X_trn, M, efC, 0, 200, threads, max_level);
63+
std::cout<< "After train" <<std::endl;
64+
indexer.save(model_path);
65+
std::cout<< "After save" <<std::endl;
66+
indexer.load(model_path);
67+
} else {
68+
indexer.load(model_path);
69+
fclose(fp);
70+
}
71+
72+
// prepare searcher for inference
73+
index_type num_data = X_tst.rows;
74+
auto searcher = indexer.create_searcher();
75+
searcher.prepare_inference();
76+
77+
78+
double latency = std::numeric_limits<double>::max();
79+
// REPEAT 10 times and report the best result
80+
for (int repeat = 0; repeat < 10; repeat++) {
81+
double inner_latency = 0.0;
82+
for (index_type idx = 0; idx < num_data; idx++) {
83+
StopW stopw = StopW();
84+
auto ret_pairs = indexer.predict_single(X_tst.get_row(idx), efs, topk, searcher, num_rerank);
85+
double ss = stopw.getElapsedTimeMicro();
86+
inner_latency += ss;
87+
}
88+
latency = std::min(latency, inner_latency);
89+
}
90+
// inference and calculate recalls
91+
double recall = 0.0;
92+
for (index_type idx = 0; idx < num_data; idx++) {
93+
auto ret_pairs = indexer.predict_single(X_tst.get_row(idx), efs, topk, searcher, num_rerank);
94+
std::unordered_set<pecos::csr_t::index_type> true_indices;
95+
96+
for (auto k = 0u; k < topk; k++) {
97+
true_indices.insert(Y_tst.get_row(idx).val[k]); // assume Y_tst is ascendingly sorted by distance
98+
}
99+
for (auto dist_idx_pair : ret_pairs) {
100+
if (true_indices.find(dist_idx_pair.node_id) != true_indices.end()) {
101+
recall += 1.0;
102+
}
103+
}
104+
}
105+
recall = recall / num_data / topk;
106+
latency = latency / num_data / 1000.;
107+
std::cout<< recall << " : " << 1.0 / latency * 1e3 << "," <<std::endl;
108+
}
109+
110+
int main(int argc, char** argv) {
111+
std::string data_dir = argv[1];
112+
std::string model_dir = argv[2];
113+
std::string space_name = argv[3];
114+
index_type M = (index_type) atoi(argv[4]);
115+
index_type efC = (index_type) atoi(argv[5]);
116+
int threads = atoi(argv[6]);
117+
int efs = atoi(argv[7]);
118+
num_rerank = atoi(argv[8]);
119+
sub_dimension = atoi(argv[9]);
120+
index_type max_level = 8;
121+
char model_path[2048];
122+
sprintf(model_path, "%s/pecos.%s.M-%d_efC-%d_t-%d_d-%d.bin", model_dir.c_str(), space_name.c_str(), M, efC, threads, sub_dimension);
123+
// currently only support l2
124+
if (space_name.compare("l2") == 0) {
125+
run_dense<pecos::drm_t, pecos::ann::FeatVecDenseL2Simd<float>>(data_dir, model_path, M, efC, max_level, threads, efs);
126+
}
127+
}

examples/ann-hnsw-pq4bits/run.py

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import os
2+
cmd = "./go sift-euclidean-128 sift-euclidean-128 l2 %d 500 24 %d %d 0"
3+
for args in [8, 16, 24, 36, 48, 64, 96]:
4+
for efs in [10, 20, 40, 80, 120, 200, 400]:
5+
os.system(cmd % (args, efs, efs))
6+
if efs - 20 > 0:
7+
os.system(cmd % (args, efs, 20))
8+
if efs - 50 > 0:
9+
os.system(cmd % (args, efs, 50))

0 commit comments

Comments
 (0)