-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Optimization in Numeric Terms Aggregation query for Large Bucket Counts #18702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization in Numeric Terms Aggregation query for Large Bucket Counts #18702
Conversation
|
❌ Gradle check result for 74295ec: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rishabh Maurya <[email protected]> (cherry picked from commit 130d890)
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
3abbaaa to
52f4c52
Compare
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
|
❌ Gradle check result for 9f7c12d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
|
❕ Gradle check result for e124eb1: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #18702 +/- ##
============================================
+ Coverage 72.89% 72.90% +0.01%
- Complexity 69318 69339 +21
============================================
Files 5642 5643 +1
Lines 318636 318712 +76
Branches 46107 46112 +5
============================================
+ Hits 232254 232348 +94
- Misses 67540 67569 +29
+ Partials 18842 18795 -47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
❌ Gradle check result for 66ccdaa: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
ff2e323 to
ced0d1b
Compare
|
❌ Gradle check result for ced0d1b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
ced0d1b to
13c663b
Compare
|
❌ Gradle check result for 13c663b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
1f361cc to
f8a69e7
Compare
51c0446 to
506c92e
Compare
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
server/src/main/java/org/opensearch/search/DefaultSearchContext.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for 0d3ecb4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 0d3ecb4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
0d3ecb4 to
68e77e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach is similar to terms aggregation. Can you answer the questions in #18732 (review) ?
Sure,
Compared with different threshold starting from 10% and analyzed for which value QuickSelects performs better. I have added the results here:
There is no much difference in the memory usage is observed for the size above 20%. When we use quickSelect we create an array with size equal to the bucketOrdinal and copy all buckets to it to perform the topN selection. Bellow link has JVM metrics comparison. |
…ts (#18702) * optimize num agg using quick select for topN when applicable Signed-off-by: Rishabh Maurya <[email protected]> (cherry picked from commit 130d890) * Updated the numeric term aggregation logic to select topN Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Updated the algorithm selection logic Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added a feature flag for the implementation Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added profile debug information Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * use priority queue method for significant terms Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Refactored the selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added tests case with proper assertions Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added cluster settings for selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> --------- Signed-off-by: Rishabh Maurya <[email protected]> Signed-off-by: Vinay Krishna Pudyodu <[email protected]> Co-authored-by: Rishabh Maurya <[email protected]> (cherry picked from commit 7db7a5a) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ts (#18702) (#18974) * optimize num agg using quick select for topN when applicable (cherry picked from commit 130d890) * Updated the numeric term aggregation logic to select topN * Updated the algorithm selection logic * Added a feature flag for the implementation * Added profile debug information * use priority queue method for significant terms * Refactored the selection strategy * Added tests case with proper assertions * Added cluster settings for selection strategy --------- (cherry picked from commit 7db7a5a) Signed-off-by: Rishabh Maurya <[email protected]> Signed-off-by: Vinay Krishna Pudyodu <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Rishabh Maurya <[email protected]>
…ts (opensearch-project#18702) * optimize num agg using quick select for topN when applicable Signed-off-by: Rishabh Maurya <[email protected]> (cherry picked from commit 130d890) * Updated the numeric term aggregation logic to select topN Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Updated the algorithm selection logic Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added a feature flag for the implementation Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added profile debug information Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * use priority queue method for significant terms Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Refactored the selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added tests case with proper assertions Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added cluster settings for selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> --------- Signed-off-by: Rishabh Maurya <[email protected]> Signed-off-by: Vinay Krishna Pudyodu <[email protected]> Co-authored-by: Rishabh Maurya <[email protected]>
…ts (opensearch-project#18702) * optimize num agg using quick select for topN when applicable Signed-off-by: Rishabh Maurya <[email protected]> (cherry picked from commit 130d890) * Updated the numeric term aggregation logic to select topN Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Updated the algorithm selection logic Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added a feature flag for the implementation Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added profile debug information Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * use priority queue method for significant terms Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Refactored the selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added tests case with proper assertions Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added cluster settings for selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> --------- Signed-off-by: Rishabh Maurya <[email protected]> Signed-off-by: Vinay Krishna Pudyodu <[email protected]> Co-authored-by: Rishabh Maurya <[email protected]>
…ts (opensearch-project#18702) * optimize num agg using quick select for topN when applicable Signed-off-by: Rishabh Maurya <[email protected]> (cherry picked from commit 130d890) * Updated the numeric term aggregation logic to select topN Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Updated the algorithm selection logic Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added a feature flag for the implementation Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added profile debug information Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * use priority queue method for significant terms Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Refactored the selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added tests case with proper assertions Signed-off-by: Vinay Krishna Pudyodu <[email protected]> * Added cluster settings for selection strategy Signed-off-by: Vinay Krishna Pudyodu <[email protected]> --------- Signed-off-by: Rishabh Maurya <[email protected]> Signed-off-by: Vinay Krishna Pudyodu <[email protected]> Co-authored-by: Rishabh Maurya <[email protected]>
Description
In bucket aggregations, data node sends topN bucket requested to the coordinator. The contract here is to return the buckets sorted by key but topN on the basis of value.
If the number of requested top-N buckets exceeds or close to the maximum bucket ordinal, making the use of a PriorityQueue for top-N selection inefficient or redundant. So we made following modifications:
quickselectfor topN if the requested size is greater than the 20% of the total buckets.Benchmarking test results :
#18703 (comment)
Related Issues
Resolves #18703
Related #18650
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.