Add bulk off-heap scoring for float32 vectors by ChrisHegarty · Pull Request #14980 · apache/lucene

ChrisHegarty · 2025-07-22T16:36:01Z

This commit adds bulk off-heap scoring for float32 vectors.

The bulk scorer scores 4 vectors against the query vector at a time. The general idea is to structure things so that we can somewhat tackle memory latency allowing the CPU to do overlapping memory loads.

Initial results from the micro benchmarks shows good potential improvement. The benchmark creates a flat vector index with 128,000 float32 vectors with 1024 dimensions (~500MB). And times how long it takes to scores 20,000 random vectors against a query vector (lower times are better)

Benchmark                                            (size)  Mode  Cnt  Score   Error  Units
VectorScorerFloat32Benchmark.dotProductDefault         1024  avgt   15  8.505 ± 0.256  ms/op
VectorScorerFloat32Benchmark.dotProductNewBulkScore    1024  avgt   15  3.717 ± 0.158  ms/op
VectorScorerFloat32Benchmark.dotProductNewScorer       1024  avgt   15  7.287 ± 0.181  ms/op

Notes:

Just dot product for now, but other distance functions can be added as a follow up.
The bulk scorer just does 4 vectors at time, since the implementation in Lucene is more straightforward, but this could be adjusted.
~~we seem to suffer pollution of the query vector type, so for now I just added two separate independent almost identical versions of the vector dot op.~~

…ry point search as benefit would be marginal

ChrisHegarty · 2025-07-22T16:47:58Z

/cc @mccullocht

benwtrent · 2025-08-06T14:44:15Z

I switched the benchmark to be throughput per second. I get similar results, JMH indicates 2x+ improvement on macbook ARM with the vector ops side of things:

Benchmark                                            (size)   Mode  Cnt    Score    Error  Units
VectorScorerFloat32Benchmark.dotProductDefault         1024  thrpt   15  106.205 ± 10.434  ops/s
VectorScorerFloat32Benchmark.dotProductDefaultBulk     1024  thrpt   15  119.524 ±  1.094  ops/s
VectorScorerFloat32Benchmark.dotProductOptBulkScore    1024  thrpt   15  283.673 ±  2.391  ops/s
VectorScorerFloat32Benchmark.dotProductOptScorer       1024  thrpt   15  140.599 ±  0.449  ops/s

versions.lock

...org/apache/lucene/internal/vectorization/Lucene99MemorySegmentFloatVectorScorerSupplier.java

benwtrent

I left some minor comments. Seems like jars were updated by accident?

However, the change looks great!

benwtrent

Love it! Now to bulk score ALL THE THINGS!!!!

rmuir · 2025-08-07T13:45:20Z

I think when I tried my hand at a similar approach with panama I ran into
similar (neutral) results on graviton2 and really the only thing that
helped there was prefetching into cpu cache. Totally believe the results
are better elsewhere, it's just been a bit of a struggle to stand up the
tests on other machines given the resources I have at hand.

When benchmarking ARM, I really recommend to only worry about Graviton3 (ARM SVE).

IMO we should stop investing any time into 128-bit Neon (Macs, Graviton2, etc). I realize macs are convenient, but the sooner we drop vectorization support for this legacy instruction set, the better. It is not well supported by openjdk, so i see no path to success there.

This commit adds bulk off-heap scoring for float32 vectors. The bulk scorer scores 4 vectors against the query vector at a time. The general idea is to structure things so that we can somewhat tackle memory latency allowing the CPU to do overlapping memory loads. Initial results from the micro benchmarks shows good potential improvement. The benchmark creates a flat vector index with 128,000 float32 vectors with 1024 dimensions (~500MB). And times how long it takes to scores 20,000 random vectors against a query vector (lower times are better) Benchmark (size) Mode Cnt Score Error Units VectorScorerFloat32Benchmark.dotProductDefault 1024 avgt 15 8.505 ± 0.256 ms/op VectorScorerFloat32Benchmark.dotProductNewBulkScore 1024 avgt 15 3.717 ± 0.158 ms/op VectorScorerFloat32Benchmark.dotProductNewScorer 1024 avgt 15 7.287 ± 0.181 ms/op Notes: Note: Just dot product for now, but other distance functions can be added as a follow up.

benwtrent · 2025-08-12T12:27:29Z

Yep, Lucene nightlies show a nice improvement (variance is high, but its consistently better than before): https://benchmarks.mikemccandless.com/VectorSearch.html

Maybe 10+% better it seems.

This commit adds the remaining bulk float32 off-heap scoring similarities, cosine, euclidean, and max inner product. The changes in #14980 deliberately added only dot product, to avoid additional bloat on the PR and benchmarking. This PR now refactors things a little to allow for the remaining similarities to be added. Benchmarking will be carried out on them independently, as well as consideration for not negatively affecting dot product. relates #14980

This commit adds the remaining bulk float32 off-heap scoring similarities, cosine, euclidean, and max inner product. The changes in apache#14980 deliberately added only dot product, to avoid additional bloat on the PR and benchmarking. This PR now refactors things a little to allow for the remaining similarities to be added. Benchmarking will be carried out on them independently, as well as consideration for not negatively affecting dot product. relates apache#14980

mikemccand · 2025-08-15T13:45:28Z

Yep, Lucene nightlies show a nice improvement (variance is high, but its consistently better than before): https://benchmarks.mikemccandless.com/VectorSearch.html

Yay! I will add annotation for this, even though it is noisy. (Separately I would love to reduce this noise...)

This commit adds the remaining bulk float32 off-heap scoring similarities, cosine, euclidean, and max inner product. The changes in apache#14980 deliberately added only dot product, to avoid additional bloat on the PR and benchmarking. This PR now refactors things a little to allow for the remaining similarities to be added. Benchmarking will be carried out on them independently, as well as consideration for not negatively affecting dot product. relates apache#14980

mccullocht and others added 3 commits July 11, 2025 14:19

introduce RandomVectorScorer.bulkScore

7beb1d2

utilize bulk scorer in HnswGraphSearcher.searchLevel. skipping in ent…

4c8c70b

…ry point search as benefit would be marginal

Add vector bulk scoring

04f9ab1

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Jul 22, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jul 22, 2025

github-actions bot added module:core/index module:core/codecs module:core/hnsw labels Jul 22, 2025

ChrisHegarty mentioned this pull request Jul 22, 2025

Add a bulk scoring interface to RandomVectorScorer #14978

Merged

add missing file

d2df2fa

ChrisHegarty force-pushed the bulk_vector_scoring branch 2 times, most recently from 9275012 to 9c8d9d1 Compare July 25, 2025 13:25

github-actions bot added the module:build-infra label Jul 25, 2025

itr

d67772d

ChrisHegarty force-pushed the bulk_vector_scoring branch from 9c8d9d1 to d67772d Compare July 25, 2025 13:28

mccullocht and others added 13 commits July 25, 2025 10:48

use bulk scorer for exhaustive search

66b9b6f

asserts

52ff93a

ensure bulk and non-bulk scores are the same.

108fc3d

changes

5b55a04

Merge branch 'main' into bulk-vector-scorer

b7fa31f

test score through the supplier/updatableScorer interface

64a7e06

tests

79db73e

Merge branch 'bulk-vector-scorer' into bulk_vector_scoring

1abf493

asserts

8a5961e

itr

7773812

itr

fc15699

Merge branch 'main' into bulk_vector_scoring

00b40f7

itr and cleanup

48159ee

github-actions bot removed the module:core/index label Jul 31, 2025

benwtrent reviewed Aug 6, 2025

View reviewed changes

versions.lock Show resolved Hide resolved

benwtrent reviewed Aug 6, 2025

View reviewed changes

...org/apache/lucene/internal/vectorization/Lucene99MemorySegmentFloatVectorScorerSupplier.java Outdated Show resolved Hide resolved

benwtrent reviewed Aug 6, 2025

View reviewed changes

ChrisHegarty added 4 commits August 6, 2025 16:34

remove unused code

2cd6cfb

Merge branch 'main' into bulk_vector_scoring

4b6739b

Merge branch 'main' into bulk_vector_scoring

12604fc

revert test secrets

0e421bc

github-actions bot removed module:test-framework module:build-infra labels Aug 7, 2025

benwtrent approved these changes Aug 7, 2025

View reviewed changes

ChrisHegarty added 3 commits August 7, 2025 13:59

minor bench cleanup

018e28a

minor test itr

02ec945

minor itr

8dd779b

ChrisHegarty merged commit 84e5df3 into apache:main Aug 7, 2025
8 checks passed

ChrisHegarty deleted the bulk_vector_scoring branch August 7, 2025 13:54

This was referenced Aug 7, 2025

Add remaining bulk float32 off-heap scoring similarities #15037

Merged

Evaluate bulk scoring for BBQ elastic/elasticsearch#132540

Open

GroupVarInt Encoding Implementation for HNSW Graphs #14932

Merged

benwtrent mentioned this pull request Sep 4, 2025

Add bulk off-heap scoring for vector formats #15155

Open

thecoop mentioned this pull request Jan 9, 2026

Add native operations for scoring floats elastic/elasticsearch#140169

Merged

navneet1v mentioned this pull request Feb 7, 2026

[FEATURE] Use BulkScorer from VectorScorer Interface of Lucene to do Exact Search opensearch-project/k-NN#3105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulk off-heap scoring for float32 vectors#14980

Add bulk off-heap scoring for float32 vectors#14980
ChrisHegarty merged 42 commits intoapache:mainfrom
ChrisHegarty:bulk_vector_scoring

ChrisHegarty commented Jul 22, 2025 •

edited

Loading

Uh oh!

ChrisHegarty commented Jul 22, 2025

Uh oh!

benwtrent commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

benwtrent left a comment

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

rmuir commented Aug 7, 2025

Uh oh!

benwtrent commented Aug 12, 2025

Uh oh!

mikemccand commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ChrisHegarty commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChrisHegarty commented Jul 22, 2025

Uh oh!

benwtrent commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rmuir commented Aug 7, 2025

Uh oh!

benwtrent commented Aug 12, 2025

Uh oh!

mikemccand commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ChrisHegarty commented Jul 22, 2025 •

edited

Loading