-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Combining filter rewrite and skip list to optimize sub aggregation #19573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
469cf51
Combining filter rewrite and skip list approaches for further optimiz…
jainankitk e20f702
Removing parent aggregation check for perf benchmark
jainankitk 82bc95d
Adding changelog entry
jainankitk aff3dc6
Applying the skip list optimization for AutoDateHistogram
jainankitk 1c29540
Addressing checkstyle failures
jainankitk b9e9f2b
Apply spotless
jainankitk a28b9c1
Merge branch 'main' into agg-perf
jainankitk 0a9ef40
Minor bug fix
jainankitk 8d4ccf7
Merge branch 'main' into agg-perf
jainankitk 4e7d9e6
Merge branch 'main' into agg-perf
jainankitk 23fbad3
Add unit test for filter rewrite with date histogram with skiplist.
asimmahmood1 7eb64f7
Spotless check
asimmahmood1 35834e4
Fix unit test
asimmahmood1 2b593c9
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 3cdc37d
Not ready for check-in, just throwing this out to come up with differ…
asimmahmood1 d0eeb37
Revert auto date changes for this PR
asimmahmood1 66ffef1
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 0ec357a
Switch to Lucene's version of BitSetDocIdStream
asimmahmood1 7a7209f
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 37f4641
Merge branch 'main' into agg-perf
jainankitk eaf7e52
Resolving merge conflict issue
jainankitk 7c05efe
Fixing build failure
jainankitk 1d97bee
Merge branch 'main' into agg-perf
jainankitk d8448f5
Merge branch 'main' into agg-perf
jainankitk 6888b6c
This is more concise method I can of. It doesn't guarentee only SugAg…
asimmahmood1 7bb162e
Fix unit test
asimmahmood1 d961a87
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 d208ed7
Fix unit test
asimmahmood1 97ea5cb
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 ddad6e4
Updated with more restricted use for LeafCollectorModeEnum.
asimmahmood1 d8ff524
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 3fa6b0f
Add javadoc
asimmahmood1 18d42fa
Spotless
asimmahmood1 555628a
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 ab67e80
Fixed bug while refactoring, and code coverage
asimmahmood1 d6236a2
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 8904653
Remove unused code
asimmahmood1 5348994
Merge remote-tracking branch 'upstream/main' into agg-perf
asimmahmood1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
170 changes: 170 additions & 0 deletions
170
...c/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| /* | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| * | ||
| * The OpenSearch Contributors require contributions made to | ||
| * this file be licensed under the Apache-2.0 license or a | ||
| * compatible open source license. | ||
| */ | ||
|
|
||
| package org.opensearch.search.aggregations.bucket; | ||
|
|
||
| import org.apache.lucene.index.DocValuesSkipper; | ||
| import org.apache.lucene.index.NumericDocValues; | ||
| import org.apache.lucene.search.DocIdStream; | ||
| import org.apache.lucene.search.Scorable; | ||
| import org.opensearch.common.Rounding; | ||
| import org.opensearch.search.aggregations.LeafBucketCollector; | ||
| import org.opensearch.search.aggregations.bucket.terms.LongKeyedBucketOrds; | ||
|
|
||
| import java.io.IOException; | ||
|
|
||
| /** | ||
| * Histogram collection logic using skip list. | ||
| * | ||
| * @opensearch.internal | ||
| */ | ||
| public class HistogramSkiplistLeafCollector extends LeafBucketCollector { | ||
|
|
||
| private final NumericDocValues values; | ||
| private final DocValuesSkipper skipper; | ||
| private final Rounding.Prepared preparedRounding; | ||
| private final LongKeyedBucketOrds bucketOrds; | ||
| private final LeafBucketCollector sub; | ||
| private final BucketsAggregator aggregator; | ||
|
|
||
| /** | ||
| * Max doc ID (inclusive) up to which all docs values may map to the same | ||
| * bucket. | ||
| */ | ||
| private int upToInclusive = -1; | ||
|
|
||
| /** | ||
| * Whether all docs up to {@link #upToInclusive} values map to the same bucket. | ||
| */ | ||
| private boolean upToSameBucket; | ||
|
|
||
| /** | ||
| * Index in bucketOrds for docs up to {@link #upToInclusive}. | ||
| */ | ||
| private long upToBucketIndex; | ||
|
|
||
| public HistogramSkiplistLeafCollector( | ||
| NumericDocValues values, | ||
| DocValuesSkipper skipper, | ||
| Rounding.Prepared preparedRounding, | ||
| LongKeyedBucketOrds bucketOrds, | ||
| LeafBucketCollector sub, | ||
| BucketsAggregator aggregator | ||
| ) { | ||
| this.values = values; | ||
| this.skipper = skipper; | ||
| this.preparedRounding = preparedRounding; | ||
| this.bucketOrds = bucketOrds; | ||
| this.sub = sub; | ||
| this.aggregator = aggregator; | ||
| } | ||
|
|
||
| @Override | ||
| public void setScorer(Scorable scorer) throws IOException { | ||
| if (sub != null) { | ||
| sub.setScorer(scorer); | ||
| } | ||
| } | ||
|
|
||
| private void advanceSkipper(int doc, long owningBucketOrd) throws IOException { | ||
| if (doc > skipper.maxDocID(0)) { | ||
| skipper.advance(doc); | ||
| } | ||
| upToSameBucket = false; | ||
|
|
||
| if (skipper.minDocID(0) > doc) { | ||
| // Corner case which happens if `doc` doesn't have a value and is between two | ||
| // intervals of | ||
| // the doc-value skip index. | ||
| upToInclusive = skipper.minDocID(0) - 1; | ||
| return; | ||
| } | ||
|
|
||
| upToInclusive = skipper.maxDocID(0); | ||
|
|
||
| // Now find the highest level where all docs map to the same bucket. | ||
| for (int level = 0; level < skipper.numLevels(); ++level) { | ||
| int totalDocsAtLevel = skipper.maxDocID(level) - skipper.minDocID(level) + 1; | ||
| long minBucket = preparedRounding.round(skipper.minValue(level)); | ||
| long maxBucket = preparedRounding.round(skipper.maxValue(level)); | ||
|
|
||
| if (skipper.docCount(level) == totalDocsAtLevel && minBucket == maxBucket) { | ||
| // All docs at this level have a value, and all values map to the same bucket. | ||
| upToInclusive = skipper.maxDocID(level); | ||
| upToSameBucket = true; | ||
| upToBucketIndex = bucketOrds.add(owningBucketOrd, maxBucket); | ||
| if (upToBucketIndex < 0) { | ||
| upToBucketIndex = -1 - upToBucketIndex; | ||
| } | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void collect(int doc, long owningBucketOrd) throws IOException { | ||
jainankitk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if (doc > upToInclusive) { | ||
| advanceSkipper(doc, owningBucketOrd); | ||
| } | ||
|
|
||
| if (upToSameBucket) { | ||
| aggregator.incrementBucketDocCount(upToBucketIndex, 1L); | ||
| sub.collect(doc, upToBucketIndex); | ||
| } else if (values.advanceExact(doc)) { | ||
| final long value = values.longValue(); | ||
| long bucketIndex = bucketOrds.add(owningBucketOrd, preparedRounding.round(value)); | ||
| if (bucketIndex < 0) { | ||
| bucketIndex = -1 - bucketIndex; | ||
| aggregator.collectExistingBucket(sub, doc, bucketIndex); | ||
| } else { | ||
| aggregator.collectBucket(sub, doc, bucketIndex); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void collect(DocIdStream stream) throws IOException { | ||
| // This will only be called if its the top agg | ||
| collect(stream, 0); | ||
| } | ||
|
|
||
| @Override | ||
| public void collect(DocIdStream stream, long owningBucketOrd) throws IOException { | ||
| // This will only be called if its the sub aggregation | ||
| for (;;) { | ||
| int upToExclusive = upToInclusive + 1; | ||
| if (upToExclusive < 0) { // overflow | ||
| upToExclusive = Integer.MAX_VALUE; | ||
| } | ||
|
|
||
| if (upToSameBucket) { | ||
| if (sub == NO_OP_COLLECTOR) { | ||
| // stream.count maybe faster when we don't need to handle sub-aggs | ||
| long count = stream.count(upToExclusive); | ||
| aggregator.incrementBucketDocCount(upToBucketIndex, count); | ||
| } else { | ||
| final int[] count = { 0 }; | ||
| stream.forEach(upToExclusive, doc -> { | ||
| sub.collect(doc, upToBucketIndex); | ||
| count[0]++; | ||
| }); | ||
| aggregator.incrementBucketDocCount(upToBucketIndex, count[0]); | ||
| } | ||
| } else { | ||
| stream.forEach(upToExclusive, doc -> collect(doc, owningBucketOrd)); | ||
| } | ||
|
|
||
| if (stream.mayHaveRemaining()) { | ||
| advanceSkipper(upToExclusive, owningBucketOrd); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.