Skip to content

Conversation

@tanelk
Copy link
Contributor

@tanelk tanelk commented Jan 3, 2021

What changes were proposed in this pull request?

Add more aggregate expressions to EliminateDistinct rule.

Why are the changes needed?

Distinct aggregation can add a significant overhead. It's better to remove distinct whenever possible.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Jan 3, 2021
@HyukjinKwon
Copy link
Member

Looks making sense otherwise. cc @gengliangwang and @cloud-fan FYI

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38204/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38204/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133615 has finished for PR 30999 at commit d1cd01d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38459/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38459/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38461/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38461/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Test build #133870 has finished for PR 30999 at commit 5f9df99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Test build #133872 has finished for PR 30999 at commit 4b9d04b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case _: Min => true
case _: BitAndAgg => true
case _: BitOrAgg => true
case _: First => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and Last are non-deterministic, does this matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR I try to argue that they should be deterministic: #29810
TLDR: An analogous aggregator would be sum on float and double datatype - its result does depend on the order of its inputs, but is deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we finally make First and Last deterministic, then I guess they need to be removed from EliminateDistinct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's exclude First/Last here before #29810 is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@maropu
Copy link
Member

maropu commented Jan 20, 2021

@tanelk ping.

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39099/

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39099/

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Test build #134514 has finished for PR 30999 at commit f95fb8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 24, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40004/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40004/

@SparkQA
Copy link

SparkQA commented Feb 24, 2021

Test build #135424 has finished for PR 30999 at commit f95fb8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 67ec4f7 Feb 26, 2021
@maropu
Copy link
Member

maropu commented Feb 26, 2021

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants