-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset #26681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lationalGroupedDataset
|
cc @cloud-fan @maropu @HyukjinKwon @wangyum, thanks for reviewing. |
| colNames.map { colName => | ||
| val namedExpr = df.resolve(colName) | ||
| if (!namedExpr.dataType.isInstanceOf[NumericType]) { | ||
| if (!TypeCollection.NumericAndInterval.acceptsType(namedExpr.dataType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment to make it more general one?
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
Line 82 in c2d513f
| * Types that include numeric types and interval type. They are only used in unary_minus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your suggestion. Please check 。
| schema.fields.filter{ f => | ||
| TypeCollection.NumericAndInterval.acceptsType(f.dataType) | ||
| }.map { n => | ||
| queryExecution.analyzed.resolveQuoted(n.name, sparkSession.sessionState.analyzer.resolver).get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can do
queryExecution.analyzed.output.filter { attr =>
TypeCollection.NumericAndInterval.acceptsType(attr.dataType)
}
|
Test build #114462 has finished for PR 26681 at commit
|
|
Test build #114469 has finished for PR 26681 at commit
|
|
Test build #114472 has finished for PR 26681 at commit
|
|
Test build #114499 has finished for PR 26681 at commit
|
|
Test build #114519 has finished for PR 26681 at commit
|
|
Test build #114731 has finished for PR 26681 at commit
|
|
thanks, merging to master! |
…lationalGroupedDataset ### What changes were proposed in this pull request? Now the min/max/sum/avg are support for intervals, we should also enable it in RelationalGroupedDataset ### Why are the changes needed? API consistency improvement ### Does this PR introduce any user-facing change? yes, Dataset support min/max/sum/avg(mean) on intervals ### How was this patch tested? add ut Closes apache#26681 from yaooqinn/SPARK-30048. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? As we are not going to follow ANSI to implement year-month and day-time interval types, it is weird to compare the year-month part to the day-time part for our current implementation of interval type now. Additionally, the current ordering logic comes from PostgreSQL where the implementation of the interval is messy. And we are not aiming PostgreSQL compliance at all. THIS PR will revert #26681 and #26337 ### Why are the changes needed? make interval type more future-proofing ### Does this PR introduce any user-facing change? there are new in 3.0, so no ### How was this patch tested? existing uts shall work Closes #27262 from yaooqinn/SPARK-30551. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Now the min/max/sum/avg are support for intervals, we should also enable it in RelationalGroupedDataset
Why are the changes needed?
API consistency improvement
Does this PR introduce any user-facing change?
yes, Dataset support min/max/sum/avg(mean) on intervals
How was this patch tested?
add ut