Skip to content

Fix bug of HNSW#2771

Closed
hhy3 wants to merge 1 commit intofacebookresearch:mainfrom
hhy3:fix_hnsw
Closed

Fix bug of HNSW#2771
hhy3 wants to merge 1 commit intofacebookresearch:mainfrom
hhy3:fix_hnsw

Conversation

@hhy3
Copy link
Contributor

@hhy3 hhy3 commented Mar 17, 2023

No description provided.

@mdouze
Copy link
Contributor

mdouze commented Mar 20, 2023

Could you explain what this is?

@hhy3
Copy link
Contributor Author

hhy3 commented Mar 20, 2023

Could you explain what this is?

When popping the first element of the heap, it could be invalid. So in this case nvalid should not decrease by 1. So it should be checked first. Otherwise, the performance would be impaired.

@mdouze
Copy link
Contributor

mdouze commented Mar 24, 2023

I believe this can happen only if a -1 id gets pushed on the heap, which does never happen, correct?

@hhy3
Copy link
Contributor Author

hhy3 commented Mar 25, 2023

I believe this can happen only if a -1 id gets pushed on the heap, which does never happen, correct?

It'll be set to -1 when popping the min element. https://github.com/facebookresearch/faiss/blob/main/faiss/impl/HNSW.cpp#L877
And after I made this modification the recall improved a lot.

@facebook-github-bot
Copy link
Contributor

@alexanderguzhva has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@alexanderguzhva
Copy link
Contributor

alexanderguzhva commented Apr 8, 2023

Confirmed for both the bug and the recall rate improvement.

@facebook-github-bot
Copy link
Contributor

@alexanderguzhva merged this pull request in 159641a.

@mdouze
Copy link
Contributor

mdouze commented Apr 11, 2023

Thanks a lot for this fix! it increases the accuracy of HNSW by a lot!

BZO95 added a commit to BZO95/faiss that referenced this pull request Apr 10, 2025
Summary: Pull Request resolved: facebookresearch/faiss#2771

Test Plan:
A DEFINITE BUG. Fixing it improves the recall rate.

Reproduce with the following code:
```
MinimaxHeap heap(1);
heap.push(1, 1.0);

float v1 = 0;
heap.pop(&v1);

heap.push(1, 1.0);
assert(heap.nvalid == 1);
```

Baseline
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14893.718 ms
search
efSearch 16 bounded queue True 	   0.007 ms per query, R@1 0.8823, missing rate 0.0000
efSearch 16 bounded queue False 	   0.008 ms per query, R@1 0.9120, missing rate 0.0000
efSearch 32 bounded queue True 	   0.012 ms per query, R@1 0.9565, missing rate 0.0000
efSearch 32 bounded queue False 	   0.011 ms per query, R@1 0.9641, missing rate 0.0000
efSearch 64 bounded queue True 	   0.018 ms per query, R@1 0.9889, missing rate 0.0000
efSearch 64 bounded queue False 	   0.019 ms per query, R@1 0.9896, missing rate 0.0000
efSearch 128 bounded queue True 	   0.036 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 128 bounded queue False 	   0.037 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 256 bounded queue True 	   0.062 ms per query, R@1 0.9991, missing rate 0.0000
efSearch 256 bounded queue False 	   0.067 ms per query, R@1 0.9991, missing rate 0.0000

[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (fc6e9b938|remote/fbsource/stable...)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8900.962 ms
search
efSearch 16 	   0.003 ms per query, R@1 0.7365, missing rate 0.0000
efSearch 32 	   0.006 ms per query, R@1 0.8712, missing rate 0.0000
efSearch 64 	   0.011 ms per query, R@1 0.9415, missing rate 0.0000
efSearch 128 	   0.018 ms per query, R@1 0.9778, missing rate 0.0000
efSearch 256 	   0.036 ms per query, R@1 0.9917, missing rate 0.0000
```

Candidate:
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
I0408 09:45:20.949554 3024612 ContainerResourceMonitor.cpp:68] devserver cgroup2Reader creation is successful
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14243.637 ms
search
efSearch 16 bounded queue True 	   0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 16 bounded queue False 	   0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 32 bounded queue True 	   0.011 ms per query, R@1 0.9643, missing rate 0.0000
efSearch 32 bounded queue False 	   0.011 ms per query, R@1 0.9644, missing rate 0.0000
efSearch 64 bounded queue True 	   0.018 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 64 bounded queue False 	   0.020 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 128 bounded queue True 	   0.036 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 128 bounded queue False 	   0.035 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 256 bounded queue True 	   0.064 ms per query, R@1 0.9994, missing rate 0.0000
efSearch 256 bounded queue False 	   0.062 ms per query, R@1 0.9994, missing rate 0.0000

[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (6de3a2d76)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8451.601 ms
search
efSearch 16 	   0.004 ms per query, R@1 0.8025, missing rate 0.0000
efSearch 32 	   0.006 ms per query, R@1 0.8925, missing rate 0.0000
efSearch 64 	   0.011 ms per query, R@1 0.9480, missing rate 0.0000
efSearch 128 	   0.019 ms per query, R@1 0.9793, missing rate 0.0000
efSearch 256 	   0.035 ms per query, R@1 0.9919, missing rate 0.0000

```

Reviewed By: mdouze

Differential Revision: D44815702

Pulled By: alexanderguzhva

fbshipit-source-id: ca7c7e83a6560316af543bde125ac703bf2e1dac
samanthawaters8882michaeldonovan added a commit to samanthawaters8882michaeldonovan/faiss that referenced this pull request Oct 12, 2025
Summary: Pull Request resolved: facebookresearch/faiss#2771

Test Plan:
A DEFINITE BUG. Fixing it improves the recall rate.

Reproduce with the following code:
```
MinimaxHeap heap(1);
heap.push(1, 1.0);

float v1 = 0;
heap.pop(&v1);

heap.push(1, 1.0);
assert(heap.nvalid == 1);
```

Baseline
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14893.718 ms
search
efSearch 16 bounded queue True 	   0.007 ms per query, R@1 0.8823, missing rate 0.0000
efSearch 16 bounded queue False 	   0.008 ms per query, R@1 0.9120, missing rate 0.0000
efSearch 32 bounded queue True 	   0.012 ms per query, R@1 0.9565, missing rate 0.0000
efSearch 32 bounded queue False 	   0.011 ms per query, R@1 0.9641, missing rate 0.0000
efSearch 64 bounded queue True 	   0.018 ms per query, R@1 0.9889, missing rate 0.0000
efSearch 64 bounded queue False 	   0.019 ms per query, R@1 0.9896, missing rate 0.0000
efSearch 128 bounded queue True 	   0.036 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 128 bounded queue False 	   0.037 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 256 bounded queue True 	   0.062 ms per query, R@1 0.9991, missing rate 0.0000
efSearch 256 bounded queue False 	   0.067 ms per query, R@1 0.9991, missing rate 0.0000

[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (fc6e9b938|remote/fbsource/stable...)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8900.962 ms
search
efSearch 16 	   0.003 ms per query, R@1 0.7365, missing rate 0.0000
efSearch 32 	   0.006 ms per query, R@1 0.8712, missing rate 0.0000
efSearch 64 	   0.011 ms per query, R@1 0.9415, missing rate 0.0000
efSearch 128 	   0.018 ms per query, R@1 0.9778, missing rate 0.0000
efSearch 256 	   0.036 ms per query, R@1 0.9917, missing rate 0.0000
```

Candidate:
```
[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (064c246e0)]$ taskset -c 72-95 ./bench_hnsw.par 16 hnsw
load data
I0408 09:45:20.949554 3024612 ContainerResourceMonitor.cpp:68] devserver cgroup2Reader creation is successful
Testing HNSW Flat
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 4
Adding 1 elements at level 4
Adding 26 elements at level 3
Adding 951 elements at level 2
Adding 30276 elements at level 1
Adding 968746 elements at level 0
Done in 14243.637 ms
search
efSearch 16 bounded queue True 	   0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 16 bounded queue False 	   0.006 ms per query, R@1 0.9122, missing rate 0.0000
efSearch 32 bounded queue True 	   0.011 ms per query, R@1 0.9643, missing rate 0.0000
efSearch 32 bounded queue False 	   0.011 ms per query, R@1 0.9644, missing rate 0.0000
efSearch 64 bounded queue True 	   0.018 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 64 bounded queue False 	   0.020 ms per query, R@1 0.9880, missing rate 0.0000
efSearch 128 bounded queue True 	   0.036 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 128 bounded queue False 	   0.035 ms per query, R@1 0.9969, missing rate 0.0000
efSearch 256 bounded queue True 	   0.064 ms per query, R@1 0.9994, missing rate 0.0000
efSearch 256 bounded queue False 	   0.062 ms per query, R@1 0.9994, missing rate 0.0000

[aguzhva@devgpu005.ftw6 ~/fbsource/buck-out/v2/gen/fbcode/faiss/benchs (6de3a2d76)]$ taskset -c 72-95 ./bench_hnsw.par 4 hnsw_sq
load data
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 8451.601 ms
search
efSearch 16 	   0.004 ms per query, R@1 0.8025, missing rate 0.0000
efSearch 32 	   0.006 ms per query, R@1 0.8925, missing rate 0.0000
efSearch 64 	   0.011 ms per query, R@1 0.9480, missing rate 0.0000
efSearch 128 	   0.019 ms per query, R@1 0.9793, missing rate 0.0000
efSearch 256 	   0.035 ms per query, R@1 0.9919, missing rate 0.0000

```

Reviewed By: mdouze

Differential Revision: D44815702

Pulled By: alexanderguzhva

fbshipit-source-id: ca7c7e83a6560316af543bde125ac703bf2e1dac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants