-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Composite Terms Aggregation] Optimize by removing unnecessary object allocations #18531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
{"run-benchmark-test": "id_4"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3470/ . Final results will be published once the job is completed. |
|
❌ Gradle check result for cfdfa76: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3470/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/122/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/138/
|
|
The PR looks good. Some points: Slot object reuse - is this per-thread or shared? If shared, need synchronization. If per-thread, document the threading model assumptions clearly. I dont see tests for the core optimizations. I would suggest explicit test that validates:
Also, this changes object lifecycle in hot path. Validate:
Where's the actual slot reuse implementation? The PR description mentions the optimization but need to see the specific code changes to evaluate correctness. Regarding benchmarks, standard microbenchmarks may not capture GC improvements. Consider longer-running tests with memory pressure to validate allocation reduction benefits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to: #18531 (comment)
|
{"run-benchmark-test": "id_4"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/3985/ . Final results will be published once the job is completed. |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/3985/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/141/
|
|
Latest results 90th percentile service time composite-terms 223.699 217.607 -6.09272 ms Btw, I think you had mem allocation profile results using async, you can paste it? Otherwise looks good to me. |
|
@asimmahmood1 The major changes are observed in improved throughput and less memory usage. The PR benchmark runs are capped at 2 ops/sec, because of which the difference wasn't noted there much. In all my multiple local runs, I could see a better throughput (0.88 to 0.98 ops/s) which is nice. Since the code changes don't involve major logical changes, the diff is service time improvement is minor. Also, memory profiling on initial analyzed chunks is also improved. Before:
After:
|
Signed-off-by: Sandesh Kumar <[email protected]>
|
@atris Thanks for taking a look. The objects inside an Aggregator class are not shared between threads. Also, I don't suspect memory leaks because of this change. |
… allocations (opensearch-project#18531) * reuse slot objects for lookups Signed-off-by: Sandesh Kumar <[email protected]> * refactor streams to for loop Signed-off-by: Sandesh Kumar <[email protected]> --------- Signed-off-by: Sandesh Kumar <[email protected]>


Description
Trying to achieve micro-improvements with some objects overheads in a composite terms aggregation request:
sourceConfigswith the help ofstreams5 times. Changing this to a singlefor loopto avoid multiple traversals. Also, the size of thesourceConfigsis the size of fields in composite aggregation request, so most use cases will see 2-3 fields, for which the overhead of 5 streams doesn't make sense.Regarding concurrent search, when intra segment concurrent search will be introduced, all aggregations will break naturally, so that will be another problem on its own which will be a follow-up on. #18879
Related Issues
Resolves #18440
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.