-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix cardinality agg pruning optimization by self collecting #19473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cardinality agg pruning optimization by self collecting #19473
Conversation
* changes in DenseConjunctionBulkCollector is matching too many documents due to scoreWindow logic * will raise an issue in Lucene to track this * for now collect ourself * this optimization only runs on card agg without sub, and only within the pruning optimization * worst case we can disable this optimization via index setting Signed-off-by: Asim Mahmood <[email protected]>
Signed-off-by: Asim Mahmood <[email protected]>
|
Tested on nyc_taxis |
|
@asimmahmood1 looks like a good fix. Tbh, i don't mind keeping this logic long term here as dynamic pruning collection logic shouldn't care about any lucene optimizations in future. Should we still create a lucene issue for possible regression if it is applicable in other queries? |
jainankitk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19473 +/- ##
============================================
- Coverage 73.03% 73.02% -0.01%
+ Complexity 70152 70087 -65
============================================
Files 5683 5683
Lines 321533 321547 +14
Branches 46503 46508 +5
============================================
- Hits 234821 234800 -21
Misses 67790 67790
- Partials 18922 18957 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
{"run-benchmark-test": "id_1"} |
|
{"run-benchmark-test": "id_3"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/4563/ . Final results will be published once the job is completed. |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/4564/ . Final results will be published once the job is completed. |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4564/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/165/
|
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4563/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/166/
|
|
Benchmark confirms the fix:
|
…ch-project#19473) Signed-off-by: Asim Mahmood <[email protected]>
+1 Not the first time this optimization gets affected by Lucene change. It's better to collect in this way, directly reuse the related logic from DefaultBulkScorer. |
…ch-project#19473) Signed-off-by: Asim Mahmood <[email protected]>
Description
Related Issues
Fixes [#19367]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.