Skip to content

Add int7 support for DiskBBQ and tests#141183

Merged
ah89 merged 21 commits intoelastic:mainfrom
ah89:feature/bbq-multibit-quantization-139591
Feb 27, 2026
Merged

Add int7 support for DiskBBQ and tests#141183
ah89 merged 21 commits intoelastic:mainfrom
ah89:feature/bbq-multibit-quantization-139591

Conversation

@ah89
Copy link
Copy Markdown
Contributor

@ah89 ah89 commented Jan 23, 2026

Enable 7-bit quantization in DiskBBQ paths.
Extend SIMD OSQ tests and OSQ benchmark for 7-bit.

Closes #139591

@ah89 ah89 added >feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0 labels Jan 23, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ah89 ah89 requested review from benwtrent and tteofili January 23, 2026 14:54
Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does recall & latency look like?

Good first pass here.

Comment on lines +606 to +607
private final ESNextOSQVectorsScorer osqVectorsScorer;
private final ES92Int7VectorsScorer int7VectorsScorer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do this. please augment the OSQ scorer. Flipping between two of them here will lead to confusion.

@@ -203,6 +212,9 @@ public void writeVectors(QuantizedVectorValues qvv, CheckedIntConsumer<IOExcepti
writeCorrections(corrections);
}
// write tail
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this doesn't break the scoring? none of the other bits encode the tail as blocks yet.

//cc @tteofili

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this is likely to affect recall, @ah89 did you check with KnnIndexTester ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since LargeBitDiskBBQBulkWriter was seemingly never instantiated before case 7 was introduced—with cases 1, 2, and 4 handled by SmallBitDiskBBQBulkWriter—removing it corrupts the on-disk layout, preventing the reader from locating the doc IDs at the start of the block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this is likely to affect recall, @ah89 did you check with KnnIndexTester ?

Verified with KnnIndexTester on GIST-1M (960 dims, 100K docs, 100 queries, euclidean, IVF):

index_type quantize_bits recall latency (ms) QPS visited
ivf 4 0.56 0.16–0.20 5000–6250 2374
ivf 7 0.61 0.66–0.68 1470–1515 2362

7-bit shows higher recall (0.61 vs 0.56) as expected from reduced quantization error. The docsWriter fix is validated — if doc IDs were missing or misaligned, recall would drop to near zero.

as per my previous comment LargeBitDiskBBQBulkWriter was never instantiated before this PR (only case 7 routes to it, previously only 1/2/4 existed). The docsWriter.accept(i) calls mirror what SmallBitDiskBBQBulkWriter already does — writing doc IDs at the start of each bulk blokc so the reader can associate scores back to documents.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the thing is that now the ESNextDiskBBQVectorsWriter always calls DiskBBQBulkWriter#fromBitSize with the blockEncodeTailVectors parameter set to true (I was in fact thinking of dropping the parameter entirely), therefore you should see LargeBitEncodedDiskBBQBulkWriter being used with 7 bits.

Copy link
Copy Markdown
Contributor Author

@ah89 ah89 Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so I will remove the docsWriter calls from the dead LargeBitDiskBBQBulkWriter (never instantiated), but will keep them on LargeBitEncodedDiskBBQBulkWriter -- the one that's actually used.

@benwtrent
Copy link
Copy Markdown
Member

@ah89 I think all the blockers are finished :) You should be able to progress on this now.

Copy link
Copy Markdown
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only concern here is with the DiskBBQBulkWriter comment.

@@ -203,6 +212,9 @@ public void writeVectors(QuantizedVectorValues qvv, CheckedIntConsumer<IOExcepti
writeCorrections(corrections);
}
// write tail
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this is likely to affect recall, @ah89 did you check with KnnIndexTester ?

Comment on lines +29 to +30
int dimensions = 4;
int bulkSize = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a couple of similar tests with a random pick within (i) arrays of allowed values and a (ii) arrays of incompatible values ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree — good idea.

- Added support for 7-bit symmetric quantization in VectorScorerOSQBenchmark and ESNextOSQVectorsScorer.
- Updated the handling of quantization in various methods to accommodate the new bit size.
- Modified tests to validate the new 7-bit encoding functionality in DiskBBQBulkWriter and related classes.
- Ensured backward compatibility by maintaining existing 1, 2, and 4-bit quantization methods.

Relates to elastic#139591
@ah89 ah89 force-pushed the feature/bbq-multibit-quantization-139591 branch from f3feeec to 6acfa0f Compare February 11, 2026 01:21
Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to need to update MemorySegmentESNextOSQVectorsScorer

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Feb 12, 2026

You are going to need to update MemorySegmentESNextOSQVectorsScorer

The 7-bit QPS went from ~1,500 to ~9,500, a ~6x speedup with negligible recall drops on the same test configuration.

index_type quantize_bits recall latency (ms) QPS visited
ivf 4 0.56 0.13–0.15 6666–7692 2305
ivf 7 0.60 0.10–0.11 9090–10000 2310

ah89 added 2 commits February 11, 2026 16:39
…extOSQVectorsScorer

- Introduced MSInt7SymmetricESNextOSQVectorsScorer to handle 7-bit symmetric quantization.
- Updated MemorySegmentESNextOSQVectorsScorer to support new query/index bits combination.
- Modified PanamaESVectorizationProvider to accommodate the new scoring logic for 7-bit vectors.
@tteofili
Copy link
Copy Markdown
Contributor

tteofili commented Feb 12, 2026

once we're satisfied with the low level impl, we might open it up to be configurable in DenseVectorFieldMapper (although this also changes query bits, not just indexed vectors bits).

@ah89 ah89 requested review from benwtrent and tteofili February 12, 2026 16:18
Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see the perf numbers :). One concern on the 'unoptimized path'.

Comment on lines +96 to +100
int total = 0;
for (int i = 0; i < dimensions; i++) {
total += in.readByte() * q[i];
}
return total;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put this in its own method and there is no reason for it to be so poorly optimized.

Please, read in the entire byte array and use VectorUtil.dotProduct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic replaced with quantized7BitScore method that bulk-reads all index vector bytes at once into a pre-allocated reusable scratch buffer and computes the result using VectorUtil.dotProduct

Copy link
Copy Markdown
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the VectorUtil change is done, LGTM

Comment on lines +96 to +100
int total = 0;
for (int i = 0; i < dimensions; i++) {
total += in.readByte() * q[i];
}
return total;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@ah89 ah89 requested a review from benwtrent February 13, 2026 23:47
ah89 and others added 4 commits February 17, 2026 17:38
Updated the conditional logic in the PanamaESVectorizationProvider to improve readability by grouping related conditions. This change ensures that the logic for checking query and index bits is clearer and more maintainable.
benwtrent and others added 3 commits February 18, 2026 10:42
Supports 7-bit quantization by introducing proper packing routines and
adjusting test logic to clamp values and pass correct bit types.
Fixes issues with handling 7-bit symmetric quantization and ensures
consistent query vector creation and scoring.
Enhances robustness of tests and vector scorer logic for 7-bit cases.
Comment on lines +198 to +208
int addition = Short.toUnsignedInt(input.readShort());
int addition = input.readInt();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't break everything else? I am surprised. Seems like a huge bug as I thought we were on int in the Next scorer for a while now...

@ldematte @thecoop

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't break, as it's reading garbage, but it's the same garbage each time, so the scores still match up in the benchmark test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #143137 for a fix

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern now is the weird OSQ benchmark change.

ah89 added 2 commits February 25, 2026 10:09
Unifies parameter types for quantization logic, improving clarity and
reducing casting.
Removes obsolete clamp helper, streamlining vector preprocessing.
@ah89 ah89 enabled auto-merge (squash) February 26, 2026 04:37
@tteofili
Copy link
Copy Markdown
Contributor

would it be possible to add the corresponding option also in KnnIndexTester ?

Copy link
Copy Markdown
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came in and checked it is "compatible" with our recent changes. Overall looks good, I left some comments to improve readability, but nothing functional

import java.lang.foreign.MemorySegment;

/** Panamized scorer for 7-bit symmetric quantized vectors stored as a {@link MemorySegment}. */
final class MSInt7SymmetricESNextOSQVectorsScorer extends MemorySegmentESNextOSQVectorsScorer.MemorySegmentScorer {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we settled on a convention for type names, to explicit the data and query sizes. This would be MSD7Q7ESNextOSQVectorsScorer (D7Q7 meaning: 7 bits for index data, 7 bits for queries); it is true we should rename the other implementation classes, but it would be nice to start using it for new code. CC @thecoop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import java.io.IOException;
import java.lang.foreign.MemorySegment;

/** Panamized scorer for 7-bit symmetric quantized vectors stored as a {@link MemorySegment}. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I would call this "Vectorized", as the underlying implementation can be either Panama or Native.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final byte[] vector = new byte[length];

final int queryBytes = length * (queryBits / indexBits);
final int queryBytes = indexBits == 7 ? dimensions : length * (queryBits / indexBits);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are probably identical, but shouldn't this be length instead of dimensions? For consistency and readability

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. For 7-bit quantization, length and dimensions are identical (each dimension is stored as one byte), but you are right — using length is more consistent and makes the intent clearer.

// padding bytes.
final IndexInput slice = in.slice("test", 0, (long) length * numVectors);
final var defaultScorer = defaultProvider().newESNextOSQVectorsScorer(
byte effectiveQueryBits = indexBits == 7 ? (byte) 7 : queryBits;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing that multiple times, let's make queryBits non static and initialize it in the ctor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better it would be to pass queryBits in the ctor too and generate the correct combinations in parametersFactory(), so when/if we extend it (e.g. to support D4Q7 or other combinations) we need to make minimal changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

);
final byte[] quantizeQuery = new byte[queryVectorPackedLengthInBytes];
ESVectorUtil.transposeHalfByte(scratch, quantizeQuery);
if (queryBits == 7) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this IF, packQuery for 4 bits is ESVectorUtil.transposeHalfByte()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have lorenzo or simon approve to make sure benchmarking is good. But the rest looks good to me :).

@ah89 ah89 disabled auto-merge February 27, 2026 00:29
ah89 and others added 4 commits February 27, 2026 02:14
Removes special-case logic for 7-bit quantization by treating queryBits
 uniformly across tests and scorer selection.
Standardizes scorer naming and updates test parameter generation
to ensure comprehensive coverage of query/index bit combinations.
Streamlines query data creation for improved maintainability.
Extends quantization options to include 7 bits for IVF vectors,
enabling improved flexibility and potential accuracy in benchmarking
and testing. Updates validation to accept 7-bit quantization and
refactors relevant call sites to use the new option.

Relates to improved quantization capabilities.
Copy link
Copy Markdown
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to benchmarking now are minimal as the PR includes Simon's fix. LGTM

Copy link
Copy Markdown
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

since this is the first work that affects the quantization of the query (it is always 4-bits in all the other scenarios), we might have to adjust the way these config options are exposed in dense_vector fields. Currently, we expose the bits config in index_options (1, 2, 4 are valid values), which relates to the bits used for doc embeddings; while we can just add 7 there too, I feel like, for the long run, we need to make sure those options can be consistent (e.g., once we implement 2-2, how do we expose that?).

@ah89 ah89 merged commit 2058a7e into elastic:main Feb 27, 2026
35 checks passed
@benwtrent
Copy link
Copy Markdown
Member

@ah89 we still need to expose this as bits: 7 in the index settings. Please add that.

tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
* Enhance vector scoring with 7-bit quantization support

- Added support for 7-bit symmetric quantization in VectorScorerOSQBenchmark and ESNextOSQVectorsScorer.
- Updated the handling of quantization in various methods to accommodate the new bit size.
- Modified tests to validate the new 7-bit encoding functionality in DiskBBQBulkWriter and related classes.
- Ensured backward compatibility by maintaining existing 1, 2, and 4-bit quantization methods.

Relates to elastic#139591

* Remove unused docsWriter calls in DiskBBQBulkWriter to streamline bulk writing process.

* Add support for 7-bit symmetric quantized vectors in MemorySegmentESNextOSQVectorsScorer

- Introduced MSInt7SymmetricESNextOSQVectorsScorer to handle 7-bit symmetric quantization.
- Updated MemorySegmentESNextOSQVectorsScorer to support new query/index bits combination.
- Modified PanamaESVectorizationProvider to accommodate the new scoring logic for 7-bit vectors.

* Add scratch byte array and refactor quantized 7-bit scoring method

* [CI] Auto commit changes from spotless

* Refactor condition in PanamaESVectorizationProvider for clarity

Updated the conditional logic in the PanamaESVectorizationProvider to improve readability by grouping related conditions. This change ensures that the logic for checking query and index bits is clearer and more maintainable.

* Add clamping to 7-bit for binary vectors and queries in VectorScorerOSQBenchmark

This update introduces a new method, clampTo7Bit, which ensures that binary vectors and queries are clamped to 7 bits when the bits parameter is set to 7. This change enhances the accuracy of the benchmarking process by preventing overflow in the generated byte arrays.

* Handles 7-bit quantization and packing for vectors

Supports 7-bit quantization by introducing proper packing routines and
adjusting test logic to clamp values and pass correct bit types.
Fixes issues with handling 7-bit symmetric quantization and ensures
consistent query vector creation and scoring.
Enhances robustness of tests and vector scorer logic for 7-bit cases.

* Switches bits to byte and removes unused method

Unifies parameter types for quantization logic, improving clarity and
reducing casting.
Removes obsolete clamp helper, streamlining vector preprocessing.

* Unifies query bits handling and simplifies scorer

Removes special-case logic for 7-bit quantization by treating queryBits
 uniformly across tests and scorer selection.
Standardizes scorer naming and updates test parameter generation
to ensure comprehensive coverage of query/index bit combinations.
Streamlines query data creation for improved maintainability.

* [CI] Auto commit changes from spotless

* Adds 7-bit quantization support for IVF index

Extends quantization options to include 7 bits for IVF vectors,
enabling improved flexibility and potential accuracy in benchmarking
and testing. Updates validation to accept 7-bit quantization and
refactors relevant call sites to use the new option.

Relates to improved quantization capabilities.

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
ah89 added a commit to ah89/elasticsearch that referenced this pull request Mar 4, 2026
Enables the use of 7-bit quantization for indexed vectors in BBQ,
improving flexibility for disk-based vector fields. Updates validation,
documentation, and tests to accommodate the new option and ensures
correct parameter handling throughout the codebase.

Relates to elastic#141183
ah89 added a commit that referenced this pull request Mar 6, 2026
* Adds support for 7-bit quantization in BBQ index

Enables the use of 7-bit quantization for indexed vectors in BBQ,
improving flexibility for disk-based vector fields. Updates validation,
documentation, and tests to accommodate the new option and ensures
correct parameter handling throughout the codebase.

Relates to #141183

* Updates quantization encoding selection logic

Switches from using a shifted ID to directly interpreting bits
for quantization encoding selection, improving accuracy and
consistency in vector format initialization.
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
* Adds support for 7-bit quantization in BBQ index

Enables the use of 7-bit quantization for indexed vectors in BBQ,
improving flexibility for disk-based vector fields. Updates validation,
documentation, and tests to accommodate the new option and ensures
correct parameter handling throughout the codebase.

Relates to elastic#141183

* Updates quantization encoding selection logic

Switches from using a shifted ID to directly interpreting bits
for quantization encoding selection, improving accuracy and
consistency in vector format initialization.
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
* Adds support for 7-bit quantization in BBQ index

Enables the use of 7-bit quantization for indexed vectors in BBQ,
improving flexibility for disk-based vector fields. Updates validation,
documentation, and tests to accommodate the new option and ensures
correct parameter handling throughout the codebase.

Relates to elastic#141183

* Updates quantization encoding selection logic

Switches from using a shifted ID to directly interpreting bits
for quantization encoding selection, improving accuracy and
consistency in vector format initialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DiskBBQ: Add int7/8 support for postings list encoding

6 participants