Skip to content

Conversation

@tanelk
Copy link
Contributor

@tanelk tanelk commented Sep 13, 2020

What changes were proposed in this pull request?

Mark BitAggregate as order irrelevant in EliminateSorts.

Why are the changes needed?

Performance improvements in some queries

Does this PR introduce any user-facing change?

No

How was this patch tested?

Generalized an existing UT

@tanelk tanelk changed the title Add more order irrelevant aggregates to EliminateSorts [SPARK-32868][SQL]Add more order irrelevant aggregates to EliminateSorts Sep 13, 2020
@tanelk
Copy link
Contributor Author

tanelk commented Sep 13, 2020

cc @cloud-fan , @gatorsmile and @wzhfy, you reviewed EliminateSorts before

@tanelk tanelk changed the title [SPARK-32868][SQL]Add more order irrelevant aggregates to EliminateSorts [SPARK-32868][SQL] Add more order irrelevant aggregates to EliminateSorts Sep 13, 2020
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 14, 2020

Test build #128613 has finished for PR 29740 at commit c2f191e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def isOrderIrrelevantAggs(aggs: Seq[NamedExpression]): Boolean = {
def isOrderIrrelevantAggFunction(func: AggregateFunction): Boolean = func match {
case _: Min | _: Max | _: Count => true
case _: Min | _: Max | _: Count | _: BitAggregate | _: HyperLogLogPlusPlus => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HyperLogLogPlusPlus is an estimation, will the input order affect the estimation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally it uses the hash values. From the hash it calculates the bucket id and the max number of leading zeros for all the inputs for each buckets. All of these are deterministic operations, that do not depend on the order of insertion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IMHO) even if so, we don't have any restriction about the deteminisity of the HyperLogLogPlusPlus impmentation. So, adding it in this rule look a bit dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I'll remove that then, but keep the BitAggregate

@SparkQA
Copy link

SparkQA commented Sep 14, 2020

Test build #128632 has finished for PR 29740 at commit 20323c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 7a17158 Sep 14, 2020
@maropu
Copy link
Member

maropu commented Sep 14, 2020

Thanks! Merged to master.

@maropu
Copy link
Member

maropu commented Sep 14, 2020

NOTE: I added the @tanelk JIRA ID in the contributor list and thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants