Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the bitmap filtering #16936

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Jan 3, 2025

Description

This change adds a new bitmap index query that solves existing performance issue described in the related issue below. In short, the time spent in constructor and cost estimation.

Bitmap index query takes advantage of the index structure or points of the numeric field, and traverse points to return an iterator of matching doc ids. Matching doc here means its value is inside the bitmap.

The main reason bitmap index query is needed is to support IndexOrDocValuesQuery.
IndexOrDocValuesQuery can decide which query to supply scorer at the runtime depending on the cost of the chosen lead iterator. For example, we have a term filter that matches 1% of the total documents, and a bitmap IndexOrDocValuesQuery matches 10% of the total documents. Obviously term filter will become the lead iterator since it's matching fewer docs. And more importantly, IndexOrDocValuesQuery will choose doc value query at runtime because the cost of index query is 10 times the cost of term filter.

The cost of bitmap index query will be based on the cardinality of the bitmap.
Note IndexOrDocValuesQuery has a heuristic to choose doc value query only when cost of index query is 8 times the lead iterator cost.

Another reason bitmap index query is useful is it's much faster than doc value query when the size of queried terms is small, like only 0.01% of the total docs. It's because index structure is always much faster to find a smaller matching set than doc value which needs to iterate over all documents. This is useful either when bitmap query is used alone, or chosen as the lead iterator.

Benchmark

Choose the Cost

Related Issues

Resolves #16317

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added Roadmap:Search Project-wide roadmap label Search:Performance Search:Query Capabilities v2.19.0 Issues and PRs related to version 2.19.0 labels Jan 3, 2025
if (cmpMin < 0) {
// query point is before the start of this cell
try {
nextQueryPoint = iterator.next();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip straight to the first entry >= minPackedValue?

Copy link
Contributor

github-actions bot commented Jan 3, 2025

❌ Gradle check result for e80b830: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test case for following
1.Empty BitMaps
2.Query with high and low cardinality.
3.Query behaviour on multi-threaded executions.

private final String field;

public BitmapIndexQuery(String field, RoaringBitmap bitmap) {
this.bitmap = bitmap;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please check field and bitmap either null or empty and terminate early?

private static BytesRefIterator bitmapEncodedIterator(RoaringBitmap bitmap) {
return new BytesRefIterator() {
private final Iterator<Integer> iterator = bitmap.iterator();
private final BytesRef encoded = new BytesRef(new byte[Integer.BYTES]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any possibility that this byteref will be modified between iterations , if yes it might bring subtle bugs , we can eagerly intialize this?

return new ConstantScoreWeight(this, boost) {
@Override
public Scorer scorer(LeafReaderContext context) throws IOException {
ScorerSupplier scorerSupplier = scorerSupplier(context);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are calling another public function scorerSupplier from above public function , what's the use case of overriding scorerSupplier here , or can we have this as part of scorer method?

}

@Override
public long cost() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frequent calls to bitmap.getLongCardinality() in cost() and toString() cause redundant computations, Can we improve like this .

private long cachedCardinality = -1;

private long getCardinality() {
    if (cachedCardinality == -1) {
        cachedCardinality = bitmap.getLongCardinality();
    }
    return cachedCardinality;
}

@Override
public long cost() {
    return getCardinality();
}

@bowenlan-amzn bowenlan-amzn force-pushed the bitmap-filtering-improve branch from e80b830 to 0697100 Compare January 7, 2025 18:05
Copy link
Contributor

github-actions bot commented Jan 7, 2025

❌ Gradle check result for 0697100: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn bowenlan-amzn force-pushed the bitmap-filtering-improve branch from 0697100 to 6a8a0d0 Compare January 7, 2025 18:29
@bowenlan-amzn bowenlan-amzn force-pushed the bitmap-filtering-improve branch from 6a8a0d0 to 941a5c2 Compare January 7, 2025 18:29
Copy link
Contributor

github-actions bot commented Jan 7, 2025

❌ Gradle check result for 941a5c2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jan 7, 2025

❌ Gradle check result for e152a01: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <[email protected]>
@bowenlan-amzn bowenlan-amzn force-pushed the bitmap-filtering-improve branch from e152a01 to 8de4f5b Compare January 8, 2025 03:34
Copy link
Contributor

github-actions bot commented Jan 8, 2025

❕ Gradle check result for 8de4f5b: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteStoreStatsIT.testDownloadStatsCorrectnessSinglePrimaryMultipleReplicaShards

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 71.00000% with 29 lines in your changes missing coverage. Please review.

Project coverage is 72.22%. Comparing base (e7e19f7) to head (23684a7).

Files with missing lines Patch % Lines
.../org/opensearch/search/query/BitmapIndexQuery.java 70.83% 18 Missing and 10 partials ⚠️
...org/opensearch/index/mapper/NumberFieldMapper.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16936      +/-   ##
============================================
+ Coverage     72.20%   72.22%   +0.01%     
+ Complexity    65289    65231      -58     
============================================
  Files          5299     5300       +1     
  Lines        303536   303619      +83     
  Branches      43941    43954      +13     
============================================
+ Hits         219180   219278      +98     
+ Misses        66441    66308     -133     
- Partials      17915    18033     +118     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: bowenlan-amzn <[email protected]>
Copy link
Contributor

github-actions bot commented Jan 8, 2025

✅ Gradle check result for 23684a7: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Roadmap:Search Project-wide roadmap label Search:Performance Search:Query Capabilities v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bitmap Filtering Performance Improvement
3 participants