[Native] Using native scorers in BBQ#141762
Conversation
...java/org/elasticsearch/simdvec/internal/vectorization/NativeBinaryQuantizedVectorScorer.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/simdvec/internal/vectorization/NativeBinaryQuantizedVectorScorer.java
Outdated
Show resolved
Hide resolved
…earch into native/es-i4i1-scorers
|
Very quick first round of benchmarks on Mac (will do more significant benchmarks on x64 later) That's 2.6x for the most common case (or what should be the most common case) bulk scoring with non-sequential access to the vector file |
libs/simdvec/src/main/java/org/elasticsearch/simdvec/ES93BinaryQuantizedVectorScorer.java
Outdated
Show resolved
Hide resolved
...estFixtures/java/org/elasticsearch/simdvec/internal/vectorization/VectorScorerTestUtils.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/index/codec/vectors/es818/ES818BinaryFlatVectorsScorer.java
Show resolved
Hide resolved
I think it's worth it, I have a TODO in the code, and I think that if Chris's work is merged soon it's better to just do it instead of revisiting it again. I will sync with him. |
|
Buildkite benchmark this with so-vector-default please |
|
Buildkite benchmark this with so-vector please |
|
Buildkite benchmark this with so-vector please |
💚 Build Succeeded
This build ran two so-vector benchmarks to evaluate performance impact of this PR. History
|
|
I also updated the code to use @ChrisHegarty changes from #141718. |
Made-with: Cursor

This PR introduces native scorers for BBQ.
It introduces and exposes a new
ES93BinaryQuantizedVectorsScorerfrom the simdvec library, and uses it inES818BinaryFlatVectorsScorerto do the scoring on the quantized data.The approach taken by this PR is slightly different from the other HNSW scorers exposed via
VectorScorerFactory, and instead it uses an approach similar to DiskBBQ. This is necessary, as some of the classes involved (e.g.BinarizedByteVectorValuesand implementations,BQVectorUtils, etc.) are not yet in Lucene, but are ES specific and implemented inserver.To avoid a big refactoring, we keep everything in server as it is today, and change existing scorers (
ES818BinaryFlatVectorsScorer) to call into the simdvec provided implementation passing the raw values.When
BinarizedByteVectorValuesand supporting classes are moved to Lucene, we can revisit the classes introduced by this PR and move toVectorScorerFactory.Microbenchmarks show a small speedup in single scoring (between 0 and 1.6x on ARM, and between 1.3x and 1.6x on x64) and a very nice speedup in bulk (~2.5x on ARM and between 2x and 4x on x64)