[SPARK-21055][SQL] replace grouping__id with grouping_id() #18270

cenyuhai · 2017-06-11T12:09:13Z

What changes were proposed in this pull request?

spark does not support grouping__id, it has grouping_id() instead.
But it is not convenient for hive user to change to spark-sql
so this pr is to replace grouping__id with grouping_id()
hive user need not to alter their scripts

How was this patch tested?

test with SQLQuerySuite.scala

SparkQA · 2017-06-11T16:20:08Z

Test build #77893 has finished for PR 18270 at commit 6fd567c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-12T05:23:23Z

Test build #77913 has started for PR 18270 at commit f532d9f.

cenyuhai · 2017-06-12T12:50:00Z

why failed？

dongjoon-hyun · 2017-06-20T18:10:55Z

Retest this please.

SparkQA · 2017-06-20T18:12:38Z

Test build #78305 has started for PR 18270 at commit f532d9f.

shaneknapp · 2017-06-20T19:33:30Z

i will retrigger this once jenkins restart

shaneknapp · 2017-06-20T19:49:26Z

test this please

dongjoon-hyun · 2017-06-20T19:56:30Z

Thank you, @shaneknapp .

SparkQA · 2017-06-20T21:33:04Z

Test build #78312 has finished for PR 18270 at commit f532d9f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-25T17:18:56Z

Test build #78582 has finished for PR 18270 at commit 3b361d7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-07-31T18:15:42Z

Retest this please.

dongjoon-hyun · 2017-07-31T18:16:04Z

Ping, @cenyuhai .

SparkQA · 2017-07-31T20:02:37Z

Test build #80086 has finished for PR 18270 at commit 3b361d7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 · 2017-08-20T13:39:18Z

Jenkins, retest this please.

SparkQA · 2017-08-20T15:24:54Z

Test build #80905 has finished for PR 18270 at commit 3b361d7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 · 2017-08-21T04:00:26Z

@cenyuhai
Are you still working on this? Could please fix the test?

YannByron · 2017-08-21T08:08:10Z

I realize the reason that leads to UTs failure is that the query result has a fixed order even though a sql statement doesn't include order by, such as the output of query 16 in group-analytics.sql.out.
And, it's not same to the execution result you run this sql statement though spark-sql.
Just modifying the output order will work.

gatorsmile · 2017-08-22T22:30:16Z

@cenyuhai Could you update this PR? I will review it then.

gatorsmile · 2017-08-22T22:40:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Could we do all the changes you made in this file in the rule ResolveFunctions?

I don't think I can do it，because ResolveFunctions is behind ResolveGroupingAnalytics

cenyuhai · 2017-08-30T13:55:36Z

Ok，I will update it

SparkQA · 2017-08-30T14:43:17Z

Test build #81257 has finished for PR 18270 at commit 1423875.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-30T14:58:55Z

Test build #81258 has finished for PR 18270 at commit e49742b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 · 2017-08-30T15:08:12Z

@gatorsmile
Could you please give some ideas why the value of grouping_id() generated in Spark is different from grouping__id in Hive? Is it designed on purpose? A lot of our users are using grouping__id in if(...) clause. The incompatibility between Spark and Hive is making our migration very difficult.

cenyuhai · 2017-08-30T15:27:55Z

retest this please

gatorsmile · 2017-08-30T15:51:50Z

@jinxing64 #10677 made the changes. Hive generates a wrong result. See the JIRA opened by Davies: https://issues.apache.org/jira/browse/HIVE-12833

jinxing64 · 2017-08-30T15:59:24Z

Thank you so much !

cenyuhai · 2017-09-02T12:46:45Z

@gatorsmile I had already tried to resolve grouping__id in ResolveFunctions. But ResolveFunctions is behind ResolveGroupingAnalytics. grouping__id may change in ResolveGroupingAnalytics.

SparkQA · 2017-09-02T14:37:38Z

Test build #81342 has finished for PR 18270 at commit 059d486.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cenyuhai · 2017-09-02T15:38:46Z

@jinxing64 I think you may revert the changes in Spark, and use the same logic of grouping__id as hive. Keep the wrong result consistently as hive did.

SparkQA · 2017-09-02T17:25:41Z

Test build #81345 has finished for PR 18270 at commit e4d6d48.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 · 2017-09-03T01:28:24Z

Thanks for notification. Actually we implement the same logic with hive, though there's a bug ...

viirya · 2017-09-03T04:45:21Z

sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out

 -- !query 16 output
-org.apache.spark.sql.AnalysisException
-grouping__id is deprecated; use grouping_id() instead;
+Java    2012    0


Are you manually editing this group-analytics.sql.out? The test failure is due to mismatching between spaces and tab. Please generate the output file with the instructions in SQLQueryTestSuite and don't edit it manually.

viirya · 2017-09-03T05:25:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

    try {
      expr transformUp {
        case GetColumnByOrdinal(ordinal, _) => plan.output(ordinal)
+        case u @ UnresolvedAttribute(nameParts)


This change looks suspicious. Doesn't ResolveMissingReferences resolve grouping_id used in order by?

gatorsmile · 2017-09-03T05:37:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+                  VirtualColumn.hiveGroupingIdName)()
+              }
+            }
          case u @ UnresolvedAttribute(nameParts) =>


just need to add if !resolver(u.name, VirtualColumn.hiveGroupingIdName) here

gatorsmile · 2017-09-03T05:37:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+                VirtualColumn.hiveGroupingIdName)()
+            }
+          }
        case u @ UnresolvedAttribute(nameParts) =>


and here too.

SparkQA · 2017-10-08T10:50:20Z

Test build #82540 has finished for PR 18270 at commit 1202bfa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-08T17:12:46Z

Test build #82543 has finished for PR 18270 at commit eac37f0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cenyuhai · 2017-10-09T01:53:42Z

@gatorsmile

gatorsmile · 2017-10-09T04:30:08Z

@cenyuhai Could you also address this comment: https://github.com/apache/spark/pull/18270/files#r136121931?

gatorsmile · 2017-10-20T06:59:14Z

retest this please

SparkQA · 2017-10-20T09:50:30Z

Test build #82925 has finished for PR 18270 at commit eac37f0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-20T16:27:02Z

LGTM

Let us resolve the issue as the follow-up PR.

gatorsmile · 2017-10-20T16:28:06Z

Thanks! Merged to master.

## What changes were proposed in this pull request? Simplifies the test cases that were added in the PR #18270. ## How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #19546 from gatorsmile/backportSPARK-21055.

gatorsmile reviewed Aug 22, 2017

View reviewed changes

cenyuhai closed this Aug 30, 2017

cenyuhai force-pushed the SPARK-21055 branch from e49742b to 823f1ee Compare August 30, 2017 15:21

eplace grouping__id with grouping_id()

36ff72a

cenyuhai reopened this Aug 30, 2017

Add order by

059d486

Add order by

e4d6d48

viirya reviewed Sep 3, 2017

View reviewed changes

gatorsmile reviewed Sep 3, 2017

View reviewed changes

generate group-analytics result

1202bfa

change code as xiaoli said

eac37f0

asfgit closed this in 16c9cc6 Oct 20, 2017

gatorsmile mentioned this pull request Oct 21, 2017

[SPARK-21055][SQL][FOLLOW-UP] replace grouping__id with grouping_id() #19546

Closed

jinxing64 mentioned this pull request Oct 26, 2017

[SPARK-22350][SQL] select grouping__id from subquery #19573

Closed

[SPARK-21055][SQL] replace grouping__id with grouping_id() #18270

[SPARK-21055][SQL] replace grouping__id with grouping_id() #18270

Uh oh!

Conversation

cenyuhai commented Jun 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 11, 2017

Uh oh!

SparkQA commented Jun 12, 2017

Uh oh!

cenyuhai commented Jun 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 20, 2017

Uh oh!

SparkQA commented Jun 20, 2017

Uh oh!

shaneknapp commented Jun 20, 2017

Uh oh!

shaneknapp commented Jun 20, 2017

Uh oh!

dongjoon-hyun commented Jun 20, 2017

Uh oh!

SparkQA commented Jun 20, 2017

Uh oh!

SparkQA commented Jun 25, 2017

Uh oh!

dongjoon-hyun commented Jul 31, 2017

Uh oh!

dongjoon-hyun commented Jul 31, 2017

Uh oh!

SparkQA commented Jul 31, 2017

Uh oh!

jinxing64 commented Aug 20, 2017

Uh oh!

SparkQA commented Aug 20, 2017

Uh oh!

jinxing64 commented Aug 21, 2017

Uh oh!

YannByron commented Aug 21, 2017

Uh oh!

gatorsmile commented Aug 22, 2017

Uh oh!

gatorsmile Aug 22, 2017

Choose a reason for hiding this comment

Uh oh!

cenyuhai Aug 30, 2017

Choose a reason for hiding this comment

Uh oh!

cenyuhai commented Aug 30, 2017

Uh oh!

SparkQA commented Aug 30, 2017

Uh oh!

SparkQA commented Aug 30, 2017

Uh oh!

jinxing64 commented Aug 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cenyuhai commented Aug 30, 2017

Uh oh!

gatorsmile commented Aug 30, 2017

Uh oh!

jinxing64 commented Aug 30, 2017

Uh oh!

cenyuhai commented Sep 2, 2017

Uh oh!

SparkQA commented Sep 2, 2017

Uh oh!

cenyuhai commented Sep 2, 2017

Uh oh!

SparkQA commented Sep 2, 2017

Uh oh!

jinxing64 commented Sep 3, 2017

Uh oh!

viirya Sep 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cenyuhai commented Jun 12, 2017 •

edited

Loading

jinxing64 commented Aug 30, 2017 •

edited

Loading

viirya Sep 3, 2017 •

edited

Loading