-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of the bitmap filtering #16936
base: main
Are you sure you want to change the base?
Improve performance of the bitmap filtering #16936
Conversation
if (cmpMin < 0) { | ||
// query point is before the start of this cell | ||
try { | ||
nextQueryPoint = iterator.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip straight to the first entry >= minPackedValue?
❌ Gradle check result for e80b830: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add test case for following
1.Empty BitMaps
2.Query with high and low cardinality.
3.Query behaviour on multi-threaded executions.
private final String field; | ||
|
||
public BitmapIndexQuery(String field, RoaringBitmap bitmap) { | ||
this.bitmap = bitmap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please check field and bitmap either null or empty and terminate early?
private static BytesRefIterator bitmapEncodedIterator(RoaringBitmap bitmap) { | ||
return new BytesRefIterator() { | ||
private final Iterator<Integer> iterator = bitmap.iterator(); | ||
private final BytesRef encoded = new BytesRef(new byte[Integer.BYTES]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any possibility that this byteref will be modified between iterations , if yes it might bring subtle bugs , we can eagerly intialize this?
return new ConstantScoreWeight(this, boost) { | ||
@Override | ||
public Scorer scorer(LeafReaderContext context) throws IOException { | ||
ScorerSupplier scorerSupplier = scorerSupplier(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are calling another public function scorerSupplier from above public function , what's the use case of overriding scorerSupplier here , or can we have this as part of scorer method?
} | ||
|
||
@Override | ||
public long cost() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frequent calls to bitmap.getLongCardinality() in cost() and toString() cause redundant computations, Can we improve like this .
private long cachedCardinality = -1;
private long getCardinality() {
if (cachedCardinality == -1) {
cachedCardinality = bitmap.getLongCardinality();
}
return cachedCardinality;
}
@Override
public long cost() {
return getCardinality();
}
e80b830
to
0697100
Compare
❌ Gradle check result for 0697100: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0697100
to
6a8a0d0
Compare
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
6a8a0d0
to
941a5c2
Compare
❌ Gradle check result for 941a5c2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for e152a01: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
e152a01
to
8de4f5b
Compare
❕ Gradle check result for 8de4f5b: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16936 +/- ##
============================================
+ Coverage 72.20% 72.22% +0.01%
+ Complexity 65289 65231 -58
============================================
Files 5299 5300 +1
Lines 303536 303619 +83
Branches 43941 43954 +13
============================================
+ Hits 219180 219278 +98
+ Misses 66441 66308 -133
- Partials 17915 18033 +118 ☔ View full report in Codecov by Sentry. |
Signed-off-by: bowenlan-amzn <[email protected]>
Description
This change adds a new bitmap index query that solves existing performance issue described in the related issue below. In short, the time spent in constructor and cost estimation.
Bitmap index query takes advantage of the index structure or points of the numeric field, and traverse points to return an iterator of matching doc ids. Matching doc here means its value is inside the bitmap.
The main reason bitmap index query is needed is to support IndexOrDocValuesQuery.
IndexOrDocValuesQuery can decide which query to supply scorer at the runtime depending on the cost of the chosen lead iterator. For example, we have a term filter that matches 1% of the total documents, and a bitmap IndexOrDocValuesQuery matches 10% of the total documents. Obviously term filter will become the lead iterator since it's matching fewer docs. And more importantly, IndexOrDocValuesQuery will choose doc value query at runtime because the cost of index query is 10 times the cost of term filter.
The cost of bitmap index query will be based on the cardinality of the bitmap.
Note IndexOrDocValuesQuery has a heuristic to choose doc value query only when cost of index query is 8 times the lead iterator cost.
Another reason bitmap index query is useful is it's much faster than doc value query when the size of queried terms is small, like only 0.01% of the total docs. It's because index structure is always much faster to find a smaller matching set than doc value which needs to iterate over all documents. This is useful either when bitmap query is used alone, or chosen as the lead iterator.
Benchmark
Choose the Cost
Related Issues
Resolves #16317
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.