[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #34799

beliefer · 2021-12-03T11:26:29Z

What changes were proposed in this pull request?

Currently, Spark aggregate pushdown will translate some standard aggregate functions, so that compile these functions to adapt specify database.
After this job, users could override JdbcDialect.compileAggregate to implement some aggregate functions supported by some database.
Because some aggregate functions will be converted show below, this PR no need to match them.

Input	Parsed	Optimized
`Every`	`aggregate.BoolAnd`	`Min`
`Any`	`aggregate.BoolOr`	`Max`
`Some`	`aggregate.BoolOr`	`Max`

Why are the changes needed?

Make the implement of *Dialect could extends the aggregate functions by override JdbcDialect.compileAggregate.

Does this PR introduce any user-facing change?

Yes. Users could pushdown more aggregate functions.

How was this patch tested?

Exists tests.

SparkQA · 2021-12-03T11:49:22Z

Test build #145900 has finished for PR 34799 at commit 82469e4.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class AnyOrSome implements AggregateFunc
public final class Corr implements AggregateFunc
public final class CovarPop implements AggregateFunc
public final class CovarSamp implements AggregateFunc
public final class Every implements AggregateFunc
public final class StddevPop implements AggregateFunc
public final class StddevSamp implements AggregateFunc
public final class VarPop implements AggregateFunc
public final class VarSamp implements AggregateFunc

SparkQA · 2021-12-03T12:38:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50375/

SparkQA · 2021-12-03T13:15:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50376/

SparkQA · 2021-12-03T13:25:09Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50375/

SparkQA · 2021-12-03T13:57:13Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50376/

SparkQA · 2021-12-03T17:31:04Z

Test build #145901 has finished for PR 34799 at commit 96d2c21.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-12-06T01:45:04Z

ping @cloud-fan

cloud-fan · 2021-12-06T09:16:42Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/AnyOrSome.java

ANY/SOME will be replaced by MAX/MIN in spark, so we can't really hit in at runtime, and no need to add data source push down API for it.

cloud-fan · 2021-12-06T09:17:54Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Corr.java

let's follow other aggregate functions and use scala style. Just name it x

Thank you for the reminder.

cloud-fan · 2021-12-06T09:18:19Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/CovarPop.java

why not x and y?

or we can use left and right consistently

cloud-fan · 2021-12-06T09:21:33Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Every.java

ditto, please check if we can really see it in the physical plan

SparkQA · 2021-12-06T11:06:10Z

Test build #145950 has finished for PR 34799 at commit e012e74.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-12-06T11:44:35Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50425/

SparkQA · 2021-12-06T12:23:25Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50425/

SparkQA · 2021-12-06T15:50:57Z

Test build #145951 has finished for PR 34799 at commit 6907b3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-12-07T09:34:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50458/

SparkQA · 2021-12-07T10:20:59Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50458/

SparkQA · 2021-12-07T10:35:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50459/

SparkQA · 2021-12-07T11:16:45Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50459/

SparkQA · 2021-12-07T12:52:37Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50463/

SparkQA · 2021-12-07T13:31:12Z

Test build #145983 has finished for PR 34799 at commit d92eecf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class Average implements AggregateFunc

SparkQA · 2021-12-07T13:39:31Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50465/

SparkQA · 2021-12-07T13:49:39Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50463/

SparkQA · 2021-12-07T14:18:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50465/

SparkQA · 2021-12-07T14:43:33Z

Test build #145984 has finished for PR 34799 at commit 651f40b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2021-12-07T16:43:45Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

      case max: Max =>
        if (max.column.fieldNames.length != 1) return None
        Some(s"MAX(${quoteIdentifier(max.column.fieldNames.head)})")
+      case avg: Average =>


Is it right to push down avg? Should we use sum and count instead?

You means pushdown sum/count to data source ? Why not use avg ?

In Spark, partial aggregate output for avg is a sequence of (sum, count). If we want to push down partial aggregate of avg to data source, should we also use sum and count?

But this PR uses First to replace Average. So we no need to pay attention to sum and count.

SparkQA · 2021-12-07T18:11:14Z

Test build #145989 has finished for PR 34799 at commit b9c7d96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2021-12-08T07:33:31Z

ping @cloud-fan

beliefer · 2022-01-19T01:08:51Z

#35101 merged.

github-actions bot added the SQL label Dec 3, 2021

HyukjinKwon changed the title ~~[SPARK-37527] Translate more standard aggregate functions for pushdown~~ [SPARK-37527][SQL] Translate more standard aggregate functions for pushdown Dec 6, 2021

cloud-fan reviewed Dec 6, 2021

View reviewed changes

beliefer added 5 commits December 7, 2021 10:24

[SPARK-37527] Translate more standard aggregate functions for pushdown

beb75b8

Update code

11865ec

Update code

0727d73

Update code

b2104de

Update code

d92eecf

beliefer force-pushed the SPARK-37527 branch from 6907b3a to d92eecf Compare December 7, 2021 08:24

Update code

651f40b

beliefer added 2 commits December 7, 2021 20:06

Update code

2adba74

Update code

b9c7d96

huaxingao reviewed Dec 7, 2021

View reviewed changes

beliefer closed this Jan 19, 2022

[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #34799

[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #34799

Uh oh!

Conversation

beliefer commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

beliefer commented Dec 6, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 7, 2021

Uh oh!

beliefer commented Dec 8, 2021

beliefer commented Dec 3, 2021 •

edited

Loading

beliefer Dec 8, 2021 •

edited

Loading