-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37527][SQL] Translate more standard aggregate functions for pushdown #34799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #145900 has finished for PR 34799 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Test build #145901 has finished for PR 34799 at commit
|
|
ping @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ANY/SOME will be replaced by MAX/MIN in spark, so we can't really hit in at runtime, and no need to add data source push down API for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow other aggregate functions and use scala style. Just name it x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not x and y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can use left and right consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, please check if we can really see it in the physical plan
|
Test build #145950 has finished for PR 34799 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145951 has finished for PR 34799 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Test build #145983 has finished for PR 34799 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Test build #145984 has finished for PR 34799 at commit
|
| case max: Max => | ||
| if (max.column.fieldNames.length != 1) return None | ||
| Some(s"MAX(${quoteIdentifier(max.column.fieldNames.head)})") | ||
| case avg: Average => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it right to push down avg? Should we use sum and count instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You means pushdown sum/count to data source ? Why not use avg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Spark, partial aggregate output for avg is a sequence of (sum, count). If we want to push down partial aggregate of avg to data source, should we also use sum and count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this PR uses First to replace Average. So we no need to pay attention to sum and count.
|
Test build #145989 has finished for PR 34799 at commit
|
|
ping @cloud-fan |
|
#35101 merged. |
What changes were proposed in this pull request?
Currently, Spark aggregate pushdown will translate some standard aggregate functions, so that compile these functions to adapt specify database.
After this job, users could override
JdbcDialect.compileAggregateto implement some aggregate functions supported by some database.Because some aggregate functions will be converted show below, this PR no need to match them.
Everyaggregate.BoolAndMinAnyaggregate.BoolOrMaxSomeaggregate.BoolOrMaxWhy are the changes needed?
Make the implement of
*Dialectcould extends the aggregate functions by overrideJdbcDialect.compileAggregate.Does this PR introduce any user-facing change?
Yes. Users could pushdown more aggregate functions.
How was this patch tested?
Exists tests.