-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35558] Optimizes for multi-quantile retrieval #32700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35558] Optimizes for multi-quantile retrieval #32700
Conversation
| result(pos) = sampled.last.value | ||
| } else { | ||
| val (newIndex, newMinRank, approxQuantile) = | ||
| findApproxQuantile(index, minRank, targetError, percentile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't need a benchmark or anything, but is this much faster if it calls this method repeatedly? I think it saves some common computation, from what I can see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by this method you mean QuantileSummaries.query then there is evidence from profiles that this method becomes a bottleneck as the percentile list grows, and in particular the redundant computation seems to be the root cause.
|
Jenkins test this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139081 has finished for PR 32700 at commit
|
|
Jenkins retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139118 has finished for PR 32700 at commit
|
|
Not sure if it's definitely related, but it looks like this results in tests that hang forever: Not 100% sure how it's connected, but, doesn't seem to be happening on other PRs? |
|
Could be related to #32725 |
I can wait until that PR is closed and retest. |
|
Jenkins retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139293 has finished for PR 32700 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
Optimizes the retrieval of approximate quantiles for an array of percentiles.
All formatting changes are the result of running ./dev/scalafmt
Why are the changes needed?
The existing implementation does repeated calls per input percentile resulting in redundant computation.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit tests for the new method.