Skip to content

Conversation

@bharath-techie
Copy link
Contributor

@bharath-techie bharath-techie commented Jun 7, 2025

Description

Star tree's SumValueAggregator today stores kahan summation state which gets reused and we found an edge case where the children node's value gets set in the kahan summation that gets used in the parent nodes which results in incorrect results.

As part of this PR removing the state from the SumValueAggregator and using CompensatedSum as the SumAggregator type.

Related Issues

#18463

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2025

❌ Gradle check result for 60144ea: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bharath-techie bharath-techie changed the title [Star tree] Star tree indexing - kahan summation fix [Bug] Star tree indexing - kahan summation fix Jun 7, 2025
@bharath-techie bharath-techie force-pushed the kahan-fix branch 2 times, most recently from 4595246 to a151898 Compare June 7, 2025 06:50
Signed-off-by: bharath-techie <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2025

❌ Gradle check result for 5165fe5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2025

❌ Gradle check result for 5165fe5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2025

✅ Gradle check result for 5165fe5: SUCCESS

@codecov
Copy link

codecov bot commented Jun 8, 2025

Codecov Report

Attention: Patch coverage is 74.07407% with 14 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (6ad6f4e) to head (a00f1b8).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
...startree/builder/AbstractDocumentsFileManager.java 46.15% 2 Missing and 5 partials ⚠️
...ch/search/aggregations/metrics/CompensatedSum.java 16.66% 2 Missing and 3 partials ⚠️
...datacube/startree/builder/BaseStarTreeBuilder.java 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main   #18462   +/-   ##
=========================================
  Coverage     72.66%   72.67%           
- Complexity    67858    67893   +35     
=========================================
  Files          5521     5522    +1     
  Lines        312541   312574   +33     
  Branches      45364    45377   +13     
=========================================
+ Hits         227113   227148   +35     
- Misses        66903    66921   +18     
+ Partials      18525    18505   -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bharath-techie
Copy link
Contributor Author

I did one round of benchmarks on nyc_taxis. Indexing has very slight to no regression.

Old code :

Min Throughput,bulk_1,55142.74,docs/s
Mean Throughput,bulk_1,64414.36,docs/s
Median Throughput,bulk_1,63686.54,docs/s
Max Throughput,bulk_1,67958.79,docs/s

50th percentile latency,bulk_1,3742.8189259953797,ms
90th percentile latency,bulk_1,4953.275570596452,ms
99th percentile latency,bulk_1,6862.332153840108,ms
99.9th percentile latency,bulk_1,9619.084308268557,ms
99.99th percentile latency,bulk_1,11603.907576405536,ms
100th percentile latency,bulk_1,13607.57394900429,ms

With this PR:

Min Throughput,bulk_1,53547.90,docs/s
Mean Throughput,bulk_1,63576.82,docs/s
Median Throughput,bulk_1,62784.96,docs/s
Max Throughput,bulk_1,67565.22,docs/s

50th percentile latency,bulk_1,3780.920432007406,ms
90th percentile latency,bulk_1,5115.1613621623255,ms
99th percentile latency,bulk_1,7003.352772257059,ms
99.9th percentile latency,bulk_1,9568.602622219014,ms
99.99th percentile latency,bulk_1,11882.583495717288,ms
100th percentile latency,bulk_1,20073.35137197515,ms

Search - Overall sum :
One thing I noticed during search tests with nyc taxis data is that there is a slight diff in fp precision in couple of fields at the last decimal digit.
I verified the precision of the kahan sum with the fix in unit tests, but still adding this info as part of this PR.

image

Copy link
Contributor

@asimmahmood1 asimmahmood1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about adding this minor improvement to account for values that larger than current sum? https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements

Signed-off-by: bharath-techie <[email protected]>
@github-actions
Copy link
Contributor

✅ Gradle check result for d60e029: SUCCESS

Signed-off-by: bharath-techie <[email protected]>
@bharath-techie
Copy link
Contributor Author

Any thoughts about adding this minor improvement to account for values that larger than current sum? https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Further_enhancements

Since we're targeting this for 3.1 and code freeze is today , will create an issue to track this enhancement.

@github-actions
Copy link
Contributor

✅ Gradle check result for a00f1b8: SUCCESS

@shwetathareja shwetathareja merged commit cf13a80 into opensearch-project:main Jun 10, 2025
30 checks passed
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 17, 2025
)

* Star tree kahan summation fix

Signed-off-by: bharath-techie <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
)

* Star tree kahan summation fix

Signed-off-by: bharath-techie <[email protected]>Signed-off-by: TJ Neuenfeldt <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
)

* Star tree kahan summation fix

Signed-off-by: bharath-techie <[email protected]>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
)

* Star tree kahan summation fix

Signed-off-by: bharath-techie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants