Adding bulkSize for benchmarking to better reflect realworld usage by benwtrent · Pull Request #142480 · elastic/elasticsearch

benwtrent · 2026-02-13T14:51:30Z

I am not 100% sure if this is needed or not? But this is how things are actually used. I wonder if we are over indexing here and thinking we will just scan 1000+ vectors in a row off-heap when actually we will jump between Java & native land in standard chunks?

Here is a local run

Benchmark                                                    (bulkSize)  (dims)   (function)  (implementation)  (numVectors)   Mode  Cnt      Score      Error  Units
VectorScorerInt7uBulkBenchmark.scoreMultipleRandom                   32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  15568.442 ±  840.138  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleRandom                 1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  16323.343 ±  796.789  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleRandomBulk               32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  25851.679 ± 2296.438  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleRandomBulk             1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  25972.461 ± 1751.552  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleSequential               32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  17561.196 ±  957.424  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleSequential             1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  17510.794 ± 1528.157  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleSequentialBulk           32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  26485.337 ±  380.044  ops/s
VectorScorerInt7uBulkBenchmark.scoreMultipleSequentialBulk         1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  26095.281 ±  701.095  ops/s
VectorScorerInt7uBulkBenchmark.scoreQueryMultipleRandom              32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  15366.295 ± 1316.324  ops/s
VectorScorerInt7uBulkBenchmark.scoreQueryMultipleRandom            1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  15804.847 ±  149.808  ops/s
VectorScorerInt7uBulkBenchmark.scoreQueryMultipleRandomBulk          32    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  27118.706 ±  892.149  ops/s
VectorScorerInt7uBulkBenchmark.scoreQueryMultipleRandomBulk        1500    1024  DOT_PRODUCT            NATIVE          1500  thrpt    5  27325.290 ±  226.967  ops/s

The difference seems very minimal, and might just be due to the new overhead of the array copy. But I don't know how to avoid that without significant refactors or a dramatic increase in heap utilization (maybe thats ok? I can adjust to where all slices are created up front....).

elasticsearchmachine · 2026-02-13T14:51:55Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

thecoop · 2026-02-20T17:24:57Z

Looks sensible, but can we link the default size 32 to a specific codepath in ES? Can we apply this for other BulkBenchmark classes too?

benwtrent · 2026-02-20T17:47:20Z

can we link the default size 32 to a specific codepath in ES?

For sure

Can we apply this for other BulkBenchmark classes too?

I suppose? My main concern is that this helps the current work with native pre-fetching.

thecoop · 2026-02-23T11:19:26Z

We can reference ESNextOSQVectorsScorer.BULK_SIZE in a comment here.

I've also been looking at HNSW; that uses maxConn * 2 for its batch sizes - so 16*2 with the default maxConn. The maximum maxConn is 512, so an absolute maximum is 1024. Exhaustive searches use a batch size of 64.

I suggest we use 16, 64, 256, and 1024 for batch sizes here.

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java

…7-benchy

benwtrent · 2026-02-23T14:05:59Z

@thecoop if this is good, I will merge and then add similar logic to our other benchmarks.

ldematte

LGTM, it makes sense to separate the size of the dataset to search (so to test the behaviour in case of cache misses) and the bulk batch size.

I was wondering if we need both numVectorsToScore and bulkSize; we should see how this behave just by scoring a single bulk, but having numVectorsToScore too is more realistic (it helps in having more realistic patterns to access the cache).

ldematte · 2026-02-23T14:08:17Z

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java

-        for (int v = 0; v < numVectorsToScore; v++) {
-            scores[v] = scorer.score(v);
+        int v = 0;
+        while (v < numVectorsToScore) {


This is the same as before, but I'm OK with the change as it highlights that we always have smaller batches.

ldematte · 2026-02-23T14:10:01Z

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java

+        for (int i = 0; i < numVectorsToScore; i += bulkSize) {
+            int toScoreInThisBatch = Math.min(bulkSize, numVectorsToScore - i);
+            // Copy the slice of sequential IDs to the scratch array
+            System.arraycopy(ids, i, toScore, 0, toScoreInThisBatch);


This is probably not negligible in terms of impact on the benchmark, or is it?

it has an impact. I don't know of a better way other than pre allocating ALL the batches ahead of time. If we are ok with the heap usage, we can do taht instead

How does the real code do this? Does it creates a new array? Use a scratch like the benchmark? I think we should do the same.
Also, probably, there is room for improvement here; we can avoid copies if we change the API to

void bulkScore(int[] nodes, float[] scores, int offset, int bulkSize)

(or add it, with the existing implementation calling bulkScore(nodes, scores, 0, int bulkSize) or something like that)

But that's a problem for another day :)

How does the real code do this? Does it creates a new array? Use a scratch like the benchmark? I think we should do the same.

For a query, it creates a new score & batch array and those single arrays are used during the entire time of the query, which means over many score runs.

However, that also means that the IDs are that are USED for batch are indeed copied in (prod is actually much slower, popping from a queue individually for HNSW)

ldematte · 2026-02-23T14:10:59Z

...test/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmarkTests.java

                bench.dims = dims;
                bench.numVectors = 1000;
                bench.numVectorsToScore = 200;
+                bench.bulkSize = 200;


I would use a number < numVectorsToScore to exercise better the 2 nested loops

I do this because testing makes so many different assumptions on the return value. it will be a significant rewrite

ldematte · 2026-02-23T14:12:55Z

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java

+    @Param({ "16", "32", "64", "256", "1024" })
+    public int bulkSize;
+
+    @Param({ "SCALAR", "LUCENE", "NATIVE" })


Nit: unless we want to explicilty exclude an entry, just @param will do (this way we don't need to worry about keeping this list updated, in case we add a new implementation)

ldematte · 2026-02-23T14:18:44Z

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java

+    // HNSW params will have the distributed ordinal bulk sizes depending on the number of connections in the graph
+    // The default is 16, maximum is 512, and the bottom layer is 2x that the configured setting, so 1024 is a maximum
+    // the MOST common case here is 32
+    @Param({ "16", "32", "64", "256", "1024" })


@ChrisHegarty FYI, I think Lucene benchmarks should be updated in the same way?

thecoop · 2026-02-23T17:18:50Z

I've been seeing some memory problems with the new benchmarks, where there weren't any before. I'll check if there's anything obvious going on

thecoop · 2026-02-24T14:20:19Z

Well, I've been unable to replicate the hangs I've seen - I suggest merging this, then we can explore further if it happens again (given this is test code)

…lastic#142480) * Adding bulkSize for benchmarking to better reflect realworld usage * adding bulk size

Adding bulkSize for benchmarking to better reflect realworld usage

b981bee

benwtrent requested review from ChrisHegarty, ldematte and thecoop February 13, 2026 14:51

benwtrent added >non-issue :Search Relevance/Vectors Vector search v9.4.0 labels Feb 13, 2026

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 13, 2026

benwtrent mentioned this pull request Feb 13, 2026

Add prefetching to x64 bulk vector implementations #142387

Merged

Merge branch 'main' into add-bulksize-to-int7-benchy

eed5d7a

thecoop reviewed Feb 23, 2026

View reviewed changes

.../src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerInt7uBulkBenchmark.java Show resolved Hide resolved

benwtrent added 3 commits February 23, 2026 08:46

Merge remote-tracking branch 'upstream/main' into add-bulksize-to-int…

f0162be

…7-benchy

Merge remote-tracking branch 'upstream/main' into add-bulksize-to-int…

346aa60

…7-benchy

adding bulk size

6c96bf5

benwtrent requested a review from thecoop February 23, 2026 14:05

ldematte reviewed Feb 23, 2026

View reviewed changes

ldematte mentioned this pull request Feb 23, 2026

Create ARM bulk sqrI8 implementation #142461

Merged

thecoop approved these changes Feb 24, 2026

View reviewed changes

benwtrent merged commit e7f09ce into elastic:main Feb 25, 2026
35 checks passed

smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Feb 25, 2026

Adding bulkSize for benchmarking to better reflect realworld usage (e…

302dbce

…lastic#142480) * Adding bulkSize for benchmarking to better reflect realworld usage * adding bulk size

thecoop mentioned this pull request Feb 26, 2026

Use batches for other bulk benchmarks #143167

Merged

Conversation

benwtrent commented Feb 13, 2026

Uh oh!

elasticsearchmachine commented Feb 13, 2026

Uh oh!

thecoop commented Feb 20, 2026

Uh oh!

benwtrent commented Feb 20, 2026

Uh oh!

thecoop commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

benwtrent commented Feb 23, 2026

Uh oh!

ldematte left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop commented Feb 23, 2026

Uh oh!

thecoop commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thecoop commented Feb 23, 2026 •

edited

Loading