Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Dec 3, 2021

What changes were proposed in this pull request?

Currently, Spark aggregate pushdown will translate some standard aggregate functions, so that compile these functions to adapt specify database.
After this job, users could override JdbcDialect.compileAggregate to implement some aggregate functions supported by some database.
Because some aggregate functions will be converted show below, this PR no need to match them.

Input Parsed Optimized
Every aggregate.BoolAnd Min
Any aggregate.BoolOr Max
Some aggregate.BoolOr Max

Why are the changes needed?

Make the implement of *Dialect could extends the aggregate functions by override JdbcDialect.compileAggregate.

Does this PR introduce any user-facing change?

Yes. Users could pushdown more aggregate functions.

How was this patch tested?

Exists tests.

@github-actions github-actions bot added the SQL label Dec 3, 2021
@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Test build #145900 has finished for PR 34799 at commit 82469e4.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class AnyOrSome implements AggregateFunc
  • public final class Corr implements AggregateFunc
  • public final class CovarPop implements AggregateFunc
  • public final class CovarSamp implements AggregateFunc
  • public final class Every implements AggregateFunc
  • public final class StddevPop implements AggregateFunc
  • public final class StddevSamp implements AggregateFunc
  • public final class VarPop implements AggregateFunc
  • public final class VarSamp implements AggregateFunc

@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50375/

@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50376/

@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50375/

@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50376/

@SparkQA
Copy link

SparkQA commented Dec 3, 2021

Test build #145901 has finished for PR 34799 at commit 96d2c21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-37527] Translate more standard aggregate functions for pushdown [SPARK-37527][SQL] Translate more standard aggregate functions for pushdown Dec 6, 2021
@beliefer
Copy link
Contributor Author

beliefer commented Dec 6, 2021

ping @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANY/SOME will be replaced by MAX/MIN in spark, so we can't really hit in at runtime, and no need to add data source push down API for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow other aggregate functions and use scala style. Just name it x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not x and y?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can use left and right consistently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, please check if we can really see it in the physical plan

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Test build #145950 has finished for PR 34799 at commit e012e74.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50425/

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50425/

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Test build #145951 has finished for PR 34799 at commit 6907b3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50458/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50458/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50459/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50459/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50463/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145983 has finished for PR 34799 at commit d92eecf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class Average implements AggregateFunc

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50465/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50463/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50465/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145984 has finished for PR 34799 at commit 651f40b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case max: Max =>
if (max.column.fieldNames.length != 1) return None
Some(s"MAX(${quoteIdentifier(max.column.fieldNames.head)})")
case avg: Average =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it right to push down avg? Should we use sum and count instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You means pushdown sum/count to data source ? Why not use avg ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spark, partial aggregate output for avg is a sequence of (sum, count). If we want to push down partial aggregate of avg to data source, should we also use sum and count?

Copy link
Contributor Author

@beliefer beliefer Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this PR uses First to replace Average. So we no need to pay attention to sum and count.

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145989 has finished for PR 34799 at commit b9c7d96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Dec 8, 2021

ping @cloud-fan

@beliefer
Copy link
Contributor Author

#35101 merged.

@beliefer beliefer closed this Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants