[SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset #26681

yaooqinn · 2019-11-26T12:18:08Z

What changes were proposed in this pull request?

Now the min/max/sum/avg are support for intervals, we should also enable it in RelationalGroupedDataset

Why are the changes needed?

API consistency improvement

Does this PR introduce any user-facing change?

yes, Dataset support min/max/sum/avg(mean) on intervals

How was this patch tested?

add ut

…lationalGroupedDataset

yaooqinn · 2019-11-26T12:18:54Z

cc @cloud-fan @maropu @HyukjinKwon @wangyum, thanks for reviewing.

maropu · 2019-11-26T13:23:44Z

sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala

      colNames.map { colName =>
        val namedExpr = df.resolve(colName)
-        if (!namedExpr.dataType.isInstanceOf[NumericType]) {
+        if (!TypeCollection.NumericAndInterval.acceptsType(namedExpr.dataType)) {


Can you update the comment to make it more general one?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala

Line 82 in c2d513f

* Types that include numeric types and interval type. They are only used in unary_minus,

thanks for your suggestion. Please check 。

cloud-fan · 2019-11-26T15:10:56Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+    schema.fields.filter{ f =>
+      TypeCollection.NumericAndInterval.acceptsType(f.dataType)
+    }.map { n =>
      queryExecution.analyzed.resolveQuoted(n.name, sparkSession.sessionState.analyzer.resolver).get


nit: we can do

queryExecution.analyzed.output.filter { attr => TypeCollection.NumericAndInterval.acceptsType(attr.dataType) }

SparkQA · 2019-11-26T16:27:26Z

Test build #114462 has finished for PR 26681 at commit 4b353df.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-26T18:05:23Z

Test build #114469 has finished for PR 26681 at commit f93cd9f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-26T19:16:48Z

Test build #114472 has finished for PR 26681 at commit 73b217a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-27T06:37:57Z

Test build #114499 has finished for PR 26681 at commit fe84ae8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-27T12:34:02Z

Test build #114519 has finished for PR 26681 at commit 69667c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class GetLocalDirsForExecutors extends BlockTransferMessage
public class LocalDirsForExecutors extends BlockTransferMessage
sealed trait ViewType
case class CreateViewStatement(

SparkQA · 2019-12-02T16:57:51Z

Test build #114731 has finished for PR 26681 at commit 6dca7b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait OperationHelper
class HiveThriftServer2AppStatusStore(
class HiveThriftServer2HistoryServerPlugin extends AppHistoryServerPlugin

maropu · 2019-12-02T22:37:15Z

cc: @cloud-fan @dongjoon-hyun

cloud-fan · 2019-12-03T10:40:28Z

thanks, merging to master!

…lationalGroupedDataset ### What changes were proposed in this pull request? Now the min/max/sum/avg are support for intervals, we should also enable it in RelationalGroupedDataset ### Why are the changes needed? API consistency improvement ### Does this PR introduce any user-facing change? yes, Dataset support min/max/sum/avg(mean) on intervals ### How was this patch tested? add ut Closes apache#26681 from yaooqinn/SPARK-30048. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? As we are not going to follow ANSI to implement year-month and day-time interval types, it is weird to compare the year-month part to the day-time part for our current implementation of interval type now. Additionally, the current ordering logic comes from PostgreSQL where the implementation of the interval is messy. And we are not aiming PostgreSQL compliance at all. THIS PR will revert #26681 and #26337 ### Why are the changes needed? make interval type more future-proofing ### Does this PR introduce any user-facing change? there are new in 3.0, so no ### How was this patch tested? existing uts shall work Closes #27262 from yaooqinn/SPARK-30551. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-30048][SQL] Enable aggregates with interval type values for Re…

4b353df

…lationalGroupedDataset

maropu reviewed Nov 26, 2019

View reviewed changes

fix

f93cd9f

cloud-fan reviewed Nov 26, 2019

View reviewed changes

refine

73b217a

maropu approved these changes Nov 26, 2019

View reviewed changes

fix test name

fe84ae8

dongjoon-hyun added the SQL label Nov 27, 2019

Merge branch 'master' into SPARK-30048

69667c2

Merge branch 'master' into SPARK-30048

6dca7b1

cloud-fan closed this in 39291cf Dec 3, 2019

yaooqinn deleted the SPARK-30048 branch December 3, 2019 11:35

cloud-fan mentioned this pull request Jan 17, 2020

[SPARK-30551][SQL] Disable comparison for interval type #27262

Closed

[SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset #26681

[SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset #26681

Uh oh!

Conversation

yaooqinn commented Nov 26, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yaooqinn commented Nov 26, 2019

Uh oh!

maropu Nov 26, 2019

Choose a reason for hiding this comment

Uh oh!

yaooqinn Nov 26, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 26, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 26, 2019

Uh oh!

SparkQA commented Nov 26, 2019

Uh oh!

SparkQA commented Nov 26, 2019

Uh oh!

SparkQA commented Nov 27, 2019

Uh oh!

SparkQA commented Nov 27, 2019

Uh oh!

SparkQA commented Dec 2, 2019

Uh oh!

maropu commented Dec 2, 2019

Uh oh!

cloud-fan commented Dec 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants