Skip to content

Conversation

@vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Jul 8, 2025

Description

In bucket aggregations, data node sends topN bucket requested to the coordinator. The contract here is to return the buckets sorted by key but topN on the basis of value.
If the number of requested top-N buckets exceeds or close to the maximum bucket ordinal, making the use of a PriorityQueue for top-N selection inefficient or redundant. So we made following modifications:

  1. use quickselect for topN if the requested size is greater than the 20% of the total buckets.
  2. If the requested size is greater than the bucket size then return all the bucket.

Benchmarking test results :

#18703 (comment)

Related Issues

Resolves #18703
Related #18650

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Search:Performance labels Jul 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

❌ Gradle check result for 74295ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

rishabhmaurya and others added 4 commits July 11, 2025 11:04
Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 130d890)
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 9f7c12d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@github-actions
Copy link
Contributor

❕ Gradle check result for e124eb1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.90%. Comparing base (c01ff89) to head (68e77e1).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...egations/bucket/terms/BucketSelectionStrategy.java 92.95% 2 Missing and 3 partials ⚠️
...regations/bucket/terms/NumericTermsAggregator.java 76.47% 4 Missing ⚠️
...va/org/opensearch/search/DefaultSearchContext.java 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18702      +/-   ##
============================================
+ Coverage     72.89%   72.90%   +0.01%     
- Complexity    69318    69339      +21     
============================================
  Files          5642     5643       +1     
  Lines        318636   318712      +76     
  Branches      46107    46112       +5     
============================================
+ Hits         232254   232348      +94     
- Misses        67540    67569      +29     
+ Partials      18842    18795      -47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

❌ Gradle check result for 66ccdaa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud force-pushed the num-term-agg-opt branch 3 times, most recently from ff2e323 to ced0d1b Compare July 14, 2025 22:11
@github-actions
Copy link
Contributor

❌ Gradle check result for ced0d1b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for 13c663b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@vinaykpud vinaykpud marked this pull request as ready for review July 15, 2025 19:12
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

✅ Gradle check result for adda83c: SUCCESS

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

❌ Gradle check result for 0d3ecb4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Aug 7, 2025
@vinaykpud vinaykpud reopened this Aug 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

❌ Gradle check result for 0d3ecb4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is similar to terms aggregation. Can you answer the questions in #18732 (review) ?

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

✅ Gradle check result for 68e77e1: SUCCESS

@rishabhmaurya rishabhmaurya merged commit 7db7a5a into opensearch-project:main Aug 7, 2025
31 checks passed
@vinaykpud
Copy link
Contributor Author

vinaykpud commented Aug 7, 2025

@jainankitk

Approach is similar to terms aggregation. Can you answer the questions in #18732 (review) ?

Sure,

  1. I am curious how we arrived at 20% as the right threshold for choosing between pq approach vs quickselect?

Compared with different threshold starting from 10% and analyzed for which value QuickSelects performs better. I have added the results here:
#18702 (comment)

  1. Does this have any memory usage implications when the size is above 20% of value count?

There is no much difference in the memory usage is observed for the size above 20%. When we use quickSelect we create an array with size equal to the bucketOrdinal and copy all buckets to it to perform the topN selection. Bellow link has JVM metrics comparison.

#18703 (comment)

@vinaykpud vinaykpud added the backport 3.2 Backport to 3.2 branch label Aug 7, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 7, 2025
…ts (#18702)

* optimize num agg using quick select for topN when applicable

Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 130d890)

* Updated the numeric term aggregation logic to select topN

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated the algorithm selection logic

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added a feature flag for the implementation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added profile debug information

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* use priority queue method for significant terms

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Refactored the selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added tests case with proper assertions

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added cluster settings for selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Co-authored-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 7db7a5a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
rishabhmaurya added a commit that referenced this pull request Aug 7, 2025
…ts (#18702) (#18974)

* optimize num agg using quick select for topN when applicable


(cherry picked from commit 130d890)

* Updated the numeric term aggregation logic to select topN



* Updated the algorithm selection logic



* Added a feature flag for the implementation



* Added profile debug information



* use priority queue method for significant terms



* Refactored the selection strategy



* Added tests case with proper assertions



* Added cluster settings for selection strategy



---------




(cherry picked from commit 7db7a5a)

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rishabh Maurya <[email protected]>
RajatGupta02 pushed a commit to RajatGupta02/OpenSearch that referenced this pull request Aug 18, 2025
…ts (opensearch-project#18702)

* optimize num agg using quick select for topN when applicable

Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 130d890)

* Updated the numeric term aggregation logic to select topN

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated the algorithm selection logic

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added a feature flag for the implementation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added profile debug information

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* use priority queue method for significant terms

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Refactored the selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added tests case with proper assertions

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added cluster settings for selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Co-authored-by: Rishabh Maurya <[email protected]>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…ts (opensearch-project#18702)

* optimize num agg using quick select for topN when applicable

Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 130d890)

* Updated the numeric term aggregation logic to select topN

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated the algorithm selection logic

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added a feature flag for the implementation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added profile debug information

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* use priority queue method for significant terms

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Refactored the selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added tests case with proper assertions

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added cluster settings for selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Co-authored-by: Rishabh Maurya <[email protected]>
vinaykpud added a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…ts (opensearch-project#18702)

* optimize num agg using quick select for topN when applicable

Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 130d890)

* Updated the numeric term aggregation logic to select topN

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated the algorithm selection logic

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added a feature flag for the implementation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added profile debug information

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* use priority queue method for significant terms

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Refactored the selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added tests case with proper assertions

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added cluster settings for selection strategy

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Co-authored-by: Rishabh Maurya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.2 Backport to 3.2 branch bug Something isn't working Search:Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Optimize num agg using quick select for topN

3 participants