[Native] Using native scorers in BBQ by ldematte · Pull Request #141762 · elastic/elasticsearch

ldematte · 2026-02-03T16:47:22Z

This PR introduces native scorers for BBQ.
It introduces and exposes a new ES93BinaryQuantizedVectorsScorer from the simdvec library, and uses it in ES818BinaryFlatVectorsScorer to do the scoring on the quantized data.
The approach taken by this PR is slightly different from the other HNSW scorers exposed via VectorScorerFactory, and instead it uses an approach similar to DiskBBQ. This is necessary, as some of the classes involved (e.g. BinarizedByteVectorValues and implementations, BQVectorUtils, etc.) are not yet in Lucene, but are ES specific and implemented in server.

To avoid a big refactoring, we keep everything in server as it is today, and change existing scorers (ES818BinaryFlatVectorsScorer) to call into the simdvec provided implementation passing the raw values.

When BinarizedByteVectorValues and supporting classes are moved to Lucene, we can revisit the classes introduced by this PR and move to VectorScorerFactory.

Microbenchmarks show a small speedup in single scoring (between 0 and 1.6x on ARM, and between 1.3x and 1.6x on x64) and a very nice speedup in bulk (~2.5x on ARM and between 2x and 4x on x64)

...java/org/elasticsearch/simdvec/internal/vectorization/NativeBinaryQuantizedVectorScorer.java

tteofili

it looks good to me ;)

…earch into native/es-i4i1-scorers

ldematte · 2026-02-06T09:54:40Z

Very quick first round of benchmarks on Mac (will do more significant benchmarks on x64 later)

Benchmark                                    (dims)  (directoryType)  (implementation)  (similarityFunction)   Mode  Cnt  Score   Error   Units
VectorScorerBQBenchmark.bulkScoreRandom        1024             MMAP            SCALAR             EUCLIDEAN  thrpt    5  2,964 ± 0,066  ops/ms
VectorScorerBQBenchmark.bulkScoreRandom        1024             MMAP        VECTORIZED             EUCLIDEAN  thrpt    5  7,746 ± 0,188  ops/ms
VectorScorerBQBenchmark.bulkScoreSequential    1024             MMAP            SCALAR             EUCLIDEAN  thrpt    5  3,188 ± 0,103  ops/ms
VectorScorerBQBenchmark.bulkScoreSequential    1024             MMAP        VECTORIZED             EUCLIDEAN  thrpt    5  7,445 ± 0,132  ops/ms
VectorScorerBQBenchmark.scoreRandom            1024             MMAP            SCALAR             EUCLIDEAN  thrpt    5  2,695 ± 0,014  ops/ms
VectorScorerBQBenchmark.scoreRandom            1024             MMAP        VECTORIZED             EUCLIDEAN  thrpt    5  4,516 ± 0,190  ops/ms
VectorScorerBQBenchmark.scoreSequential        1024             MMAP            SCALAR             EUCLIDEAN  thrpt    5  3,135 ± 0,128  ops/ms
VectorScorerBQBenchmark.scoreSequential        1024             MMAP        VECTORIZED             EUCLIDEAN  thrpt    5  4,464 ± 0,681  ops/ms

That's 2.6x for the most common case (or what should be the most common case) bulk scoring with non-sequential access to the vector file

libs/simdvec/src/main/java/org/elasticsearch/simdvec/ES93BinaryQuantizedVectorScorer.java

...estFixtures/java/org/elasticsearch/simdvec/internal/vectorization/VectorScorerTestUtils.java

.../src/main/java/org/elasticsearch/index/codec/vectors/es818/ES818BinaryFlatVectorsScorer.java

libs/simdvec/native/src/vec/c/amd64/score_1.cpp

ldematte · 2026-02-26T16:22:41Z

It might be good to wait until Chris's work is finished so that the "onheap fallback" isn't horrible due to lack of memory segments ;)

I think it's worth it, I have a TODO in the code, and I think that if Chris's work is merged soon it's better to just do it instead of revisiting it again. I will sync with him.

…earch into native/es-i4i1-scorers

ldematte · 2026-02-27T09:15:23Z

Buildkite benchmark this with so-vector-default please

benwtrent · 2026-02-27T12:29:13Z

Buildkite benchmark this with so-vector please

ldematte · 2026-03-02T13:22:41Z

Buildkite benchmark this with so-vector please

elasticmachine · 2026-03-02T13:26:03Z

💚 Build Succeeded

Buildkite Build
Commit: 753c70b
Baseline: f4320c6 (env ID c3b74ef4-c17f-4a20-aabc-a9c43d5cd58b)
Contender: 753c70b (env ID 564cb0a0-9aab-4b07-935a-85a846e540d5)
Benchmark results

This build ran two so-vector benchmarks to evaluate performance impact of this PR.

History

💚 Build #333 succeeded 29e6bf3
💔 Build #331 failed 9e2a41b
💚 Build #228 succeeded 5e6ed4b

ldematte · 2026-03-06T16:52:30Z

I was not able to sort out the buildkite benchmark results, so I run so_vector myself. Looking at flamegraphs, I can see it being used, and I was able to verify that the Java "slow" path is not used anywhere:

However, despite the speedup, the overall impact on this benchmark is limited. What I found via profiling is that:

the benchmark is heavily influenced by script-score tasks, which use individual float32 scoring (and copy data from MemorySegment to heap arrays every time -- I made a note to investigate in depth what's happening there and if we can improve it somehow).
even excluding script-related tasks, scoring is just ~30% of KNN time. bulkScore is only between 7 and 22% of KNN search time. Another 7 to 22% is spent in the single-scorer path; that is using native code too. The remaining is HNSW graph traversal (HnswGraphSearcher.search), which is the bottleneck, taking on average 62% of KNN time, then neighbor queue operations, and other overhead. Even a 2x speedup in scoring would only yield a ~10% improvement in KNN search latency (which is what we see here).
- bulkScore vs score: it seems the larger the graph, the more we spend in single scoring. FilteredHnswGraphSearcher.searchLevel and AbstractKnnVectorQuery.searchExact seem to be the main users of single-scoring.
within bulkScore, dot product is ~90% — the native scorer is doing its job efficiently. There is no significant overhead from corrections, memory copies, or other wrapper code.

Here are the results; I think we need to switch gears a bit, and understand how we can optimize (and if it is worth it) things outside scoring. It seems obvious to me that scoring in BBQ is not the bottleneck.

Task	Branch	Main	Delta
default-match-all-fm	245 ops/s	232 ops/s	+5.4%
10-50-match-all-fm	249 ops/s	226 ops/s	+10.3%
100-300-match-all-fm	117 ops/s	111 ops/s	+5.0%

ldematte · 2026-03-06T16:53:38Z

I also updated the code to use @ChrisHegarty changes from #141718.
I think this PR is good to go, and any further work should be handled separately. Wdyt? Can I have a final round of checks?

Made-with: Cursor

ChrisHegarty

LGTM

ldematte added 6 commits February 3, 2026 14:19

More specific types

2158e88

Delegate scoring to ES93BinaryQuantizedVectorsScorer

3908d63

Adding native scorer

8429da4

Rearrange parameter to be uniform with existing scorers

8335efa

Add bulk scoring (native)

8678a51

Fix

fd555be

ldematte added WIP :Search Relevance/Vectors Vector search labels Feb 3, 2026

elasticsearchmachine added the v9.4.0 label Feb 3, 2026

Merge branch 'main' into native/es-i4i1-scorers

1cfb433

ldematte commented Feb 4, 2026

View reviewed changes

...java/org/elasticsearch/simdvec/internal/vectorization/NativeBinaryQuantizedVectorScorer.java Outdated Show resolved Hide resolved

ldematte commented Feb 4, 2026

View reviewed changes

...java/org/elasticsearch/simdvec/internal/vectorization/NativeBinaryQuantizedVectorScorer.java Outdated Show resolved Hide resolved

tteofili reviewed Feb 4, 2026

View reviewed changes

ldematte added 9 commits February 5, 2026 12:03

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

d9b9fe6

Tests

e65fa05

Tests

c1d3d51

Merge branch 'native/es-i4i1-scorers' of github.com:ldematte/elastics…

cf7e735

…earch into native/es-i4i1-scorers

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

f942fb7

spotless

c493949

renaming

7784fff

Use ESVectorizationProvider scorer in getRandomVectorScorer too

a11c58b

Benchmarks

ca1b681

ldematte marked this pull request as ready for review February 6, 2026 11:27

ldematte requested review from ChrisHegarty, benwtrent and thecoop February 6, 2026 11:28

thecoop reviewed Feb 6, 2026

View reviewed changes

libs/simdvec/src/main/java/org/elasticsearch/simdvec/ES93BinaryQuantizedVectorScorer.java Outdated Show resolved Hide resolved

benwtrent added build-benchmark Trigger build benchmark job and removed build-benchmark Trigger build benchmark job labels Feb 6, 2026

thecoop reviewed Feb 26, 2026

View reviewed changes

...estFixtures/java/org/elasticsearch/simdvec/internal/vectorization/VectorScorerTestUtils.java Outdated Show resolved Hide resolved

thecoop reviewed Feb 26, 2026

View reviewed changes

.../src/main/java/org/elasticsearch/index/codec/vectors/es818/ES818BinaryFlatVectorsScorer.java Show resolved Hide resolved

thecoop reviewed Feb 26, 2026

View reviewed changes

libs/simdvec/native/src/vec/c/amd64/score_1.cpp Outdated Show resolved Hide resolved

ldematte mentioned this pull request Feb 26, 2026

Implement native (Disk)BBQ scoring (single/bulk) #139750

Closed

13 tasks

ldematte added 3 commits February 27, 2026 09:52

Merge branch 'native/es-i4i1-scorers' of github.com:ldematte/elastics…

987a38f

…earch into native/es-i4i1-scorers

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

3e79941

PR comments

9e2a41b

ldematte added 2 commits February 27, 2026 12:27

fix

61199a2

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

29e6bf3

ldematte added 3 commits March 2, 2026 12:07

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

148d6e9

Publish vec binaries + update version

5445b8a

fix after merge

753c70b

ldematte and others added 3 commits March 6, 2026 16:53

Merge remote-tracking branch 'upstream/main' into native/es-i4i1-scorers

e5f3f0a

Update scorers to use new helpers from elastic#141718

bcf6c5a

[CI] Auto commit changes from spotless

f0545d6

Fix scoreBulk EOF by seeking to start before read

94b66e6

Made-with: Cursor

ChrisHegarty approved these changes Mar 9, 2026

View reviewed changes

ldematte merged commit 631eab3 into elastic:main Mar 9, 2026
36 checks passed

ldematte deleted the native/es-i4i1-scorers branch March 9, 2026 08:41

This was referenced Mar 9, 2026

[CI] DenseVectorRollingUpgradeIT class failing #143796

Closed

[CI] DenseVectorRollingUpgradeIT fails #143835

Closed

prwhelan mentioned this pull request Mar 9, 2026

[Transform] Disable PIT for CPS #143876

Closed

thecoop mentioned this pull request Mar 31, 2026

Only use MemorySegment scorers when slices can be obtained from the IndexInput #145343

Merged

Conversation

ldematte commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tteofili left a comment

Choose a reason for hiding this comment

Uh oh!

ldematte commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldematte commented Feb 26, 2026

Uh oh!

ldematte commented Feb 27, 2026

Uh oh!

benwtrent commented Feb 27, 2026

Uh oh!

ldematte commented Mar 2, 2026

Uh oh!

elasticmachine commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

History

Uh oh!

ldematte commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldematte commented Mar 6, 2026

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ldematte commented Feb 3, 2026 •

edited

Loading

elasticmachine commented Mar 2, 2026 •

edited

Loading

ldematte commented Mar 6, 2026 •

edited

Loading