-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32868][SQL] Add more order irrelevant aggregates to EliminateSorts #29740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan , @gatorsmile and @wzhfy, you reviewed |
|
ok to test |
|
Test build #128613 has finished for PR 29740 at commit
|
| private def isOrderIrrelevantAggs(aggs: Seq[NamedExpression]): Boolean = { | ||
| def isOrderIrrelevantAggFunction(func: AggregateFunction): Boolean = func match { | ||
| case _: Min | _: Max | _: Count => true | ||
| case _: Min | _: Max | _: Count | _: BitAggregate | _: HyperLogLogPlusPlus => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HyperLogLogPlusPlus is an estimation, will the input order affect the estimation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally it uses the hash values. From the hash it calculates the bucket id and the max number of leading zeros for all the inputs for each buckets. All of these are deterministic operations, that do not depend on the order of insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IMHO) even if so, we don't have any restriction about the deteminisity of the HyperLogLogPlusPlus impmentation. So, adding it in this rule look a bit dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I'll remove that then, but keep the BitAggregate
|
Test build #128632 has finished for PR 29740 at commit
|
|
Thanks! Merged to master. |
|
NOTE: I added the @tanelk JIRA ID in the contributor list and thanks for your contribution! |
What changes were proposed in this pull request?
Mark
BitAggregateas order irrelevant inEliminateSorts.Why are the changes needed?
Performance improvements in some queries
Does this PR introduce any user-facing change?
No
How was this patch tested?
Generalized an existing UT