[SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations #28951

GuoPhilipse · 2020-06-29T16:52:58Z

What changes were proposed in this pull request?

fix error exception messages during exceptions on Union and set operations

Why are the changes needed?

Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index.

Steps to reproduce:

drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;

Query1:

select * from test1 except select * from test2;

Result1:

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0)

Query2:

select * from test1 except select * from test3;

Result2:

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types
 int <> timestamp at the 2th column of the second table;

the above query1 has the right exception message
the above query2 have the wrong errors information, it may need to change to the following

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types.
int <> timestamp at the  third column of the second table

Does this PR introduce any user-facing change?

NO

How was this patch tested?

unit test

update masterbranch

sync

holdenk · 2020-06-29T23:12:29Z

Good catch. LGTM but I'll leave it for a bit of a SQL committer has any thoughts.

HyukjinKwon · 2020-06-30T03:52:23Z

ok to test

dilipbiswal · 2020-06-30T05:54:20Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+      AttributeReference("c", IntegerType)(),
+      AttributeReference("d", TimestampType)())
+
+    val a1 = firstTable.output(0)


@GuoPhilipse Variables a1, b1, c1, d1 not used ? Were you planning to use it in the test later ?

em,,,let me remove it

dilipbiswal · 2020-06-30T05:54:29Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+    val c1 = firstTable.output(2)
+    val d1 = firstTable.output(3)
+
+    val a2 = secondTable.output(0)


dilipbiswal · 2020-06-30T05:54:38Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+    val c2 = secondTable.output(2)
+    val d2 = secondTable.output(3)
+
+    val a3 = thirdTable.output(0)


dilipbiswal · 2020-06-30T05:54:46Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+    val c3 = thirdTable.output(2)
+    val d3 = thirdTable.output(3)
+
+    val a4 = fourthTable.output(0)


Thanks @dilipbiswal for your review.

maropu · 2020-06-30T07:40:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

              case 1 => "second"
-              case i => s"${i}th"
+              case 2 => "third"
+              case i => s"${i + 1}th"


hehehe, nice catch ;)

maropu · 2020-06-30T07:40:47Z

nit: test2 not used in the PR description?

maropu · 2020-06-30T07:41:07Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

    }
  }
+
+  test("SPARK-32131 Fix wrong column index when we have more than two columns" +


nit: SPARK-32131 -> SPARK-32131:

Thanks maropu ,btw, I did not see the test build start to build, do you know why?

GuoPhilipse · 2020-06-30T08:15:18Z

nit: test2 not used in the PR description?

have updated in the PR, it is a normal case ,it is used for comparing between the normal and abnormal cases.

dilipbiswal · 2020-06-30T08:41:55Z

LGTM

GuoPhilipse · 2020-06-30T09:58:05Z

retest this please.

GuoPhilipse · 2020-06-30T12:27:57Z

It seesm i cannot trigger the test build, @dilipbiswal @maropu @HyukjinKwon @holdenk ,do you have any ideas?

holdenk · 2020-06-30T16:01:33Z

Jenkins ok to test

holdenk · 2020-06-30T16:01:45Z

Jenkins retest this please

maropu · 2020-06-30T23:58:26Z

ok to test

dongjoon-hyun

+1, LGTM.

…US operations ### What changes were proposed in this pull request? fix error exception messages during exceptions on Union and set operations ### Why are the changes needed? Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index. Steps to reproduce: ``` drop table if exists test1; drop table if exists test2; drop table if exists test3; create table if not exists test1(id int, age int, name timestamp); create table if not exists test2(id int, age timestamp, name timestamp); create table if not exists test3(id int, age int, name int); insert into test1 select 1,2,'2020-01-01 01:01:01'; insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; insert into test3 select 1,3,4; ``` Query1: ```sql select * from test1 except select * from test2; ``` Result1: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0) ``` Query2: ```sql select * from test1 except select * from test3; ``` Result2: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types int <> timestamp at the 2th column of the second table; ``` the above query1 has the right exception message the above query2 have the wrong errors information, it may need to change to the following ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the third column of the second table ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? unit test Closes #28951 from GuoPhilipse/32131-correct-error-messages. Lead-authored-by: GuoPhilipse <[email protected]> Co-authored-by: GuoPhilipse <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 02f3b80) Signed-off-by: Dongjoon Hyun <[email protected]>

SparkQA · 2020-07-01T06:38:14Z

Test build #124704 has finished for PR 28951 at commit 984c652.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…US operations fix error exception messages during exceptions on Union and set operations Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index. Steps to reproduce: ``` drop table if exists test1; drop table if exists test2; drop table if exists test3; create table if not exists test1(id int, age int, name timestamp); create table if not exists test2(id int, age timestamp, name timestamp); create table if not exists test3(id int, age int, name int); insert into test1 select 1,2,'2020-01-01 01:01:01'; insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; insert into test3 select 1,3,4; ``` Query1: ```sql select * from test1 except select * from test2; ``` Result1: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0) ``` Query2: ```sql select * from test1 except select * from test3; ``` Result2: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types int <> timestamp at the 2th column of the second table; ``` the above query1 has the right exception message the above query2 have the wrong errors information, it may need to change to the following ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the third column of the second table ``` NO unit test Closes #28951 from GuoPhilipse/32131-correct-error-messages. Lead-authored-by: GuoPhilipse <[email protected]> Co-authored-by: GuoPhilipse <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 02f3b80) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2020-07-01T06:41:14Z

Thank you all! This lands at master/3.0/2.4.

GuoPhilipse and others added 19 commits May 21, 2020 07:36

Merge pull request #1 from apache/master

bb1efa2

update masterbranch

Merge pull request #2 from apache/master

1459d5b

sync

Merge pull request #3 from apache/master

88c40fe

sync

Merge pull request #4 from apache/master

df22083

sync

Merge pull request #5 from apache/master

0436611

sync

Merge pull request #6 from apache/master

ed80c84

sync

Merge pull request #7 from apache/master

39ca87c

sync

Merge pull request #8 from apache/master

c3b3c89

sync

Merge pull request #9 from apache/master

c3546eb

sync

Merge pull request #10 from apache/master

77a339a

sync

Merge pull request #11 from apache/master

a6b4f74

sync

Merge pull request #12 from apache/master

664277e

sync

Merge pull request #13 from apache/master

fd677c9

sync

Merge pull request #14 from apache/master

93b1f63

sync

Merge pull request #15 from apache/master

a5b5474

sync

Merge pull request #16 from apache/master

f4556a4

sync

Merge pull request #17 from apache/master

6071006

sync

Merge pull request #18 from apache/master

596b842

sync

fix error messages

2865e45

probot-autolabeler bot added the SQL label Jun 29, 2020

dilipbiswal reviewed Jun 30, 2020

View reviewed changes

remove useless variable

1aa614d

maropu reviewed Jun 30, 2020

View reviewed changes

maropu approved these changes Jun 30, 2020

View reviewed changes

improve test style

984c652

dongjoon-hyun changed the title ~~[SPARK-32131][SQL] union and set operations have wrong exception infomation~~ [SPARK-32131][SQL] Fix AnalysisException messages at UNION/SET operations Jul 1, 2020

dongjoon-hyun approved these changes Jul 1, 2020

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-32131][SQL] Fix AnalysisException messages at UNION/SET operations~~ [SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations Jul 1, 2020

dongjoon-hyun closed this in 02f3b80 Jul 1, 2020

[SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations #28951

[SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations #28951

Uh oh!

Conversation

GuoPhilipse commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

holdenk commented Jun 29, 2020

Uh oh!

HyukjinKwon commented Jun 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jun 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuoPhilipse commented Jun 30, 2020

Uh oh!

dilipbiswal commented Jun 30, 2020

Uh oh!

GuoPhilipse commented Jun 30, 2020

Uh oh!

GuoPhilipse commented Jun 30, 2020

Uh oh!

holdenk commented Jun 30, 2020

Uh oh!

holdenk commented Jun 30, 2020

Uh oh!

maropu commented Jun 30, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 1, 2020

Uh oh!

dongjoon-hyun commented Jul 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

GuoPhilipse commented Jun 29, 2020 •

edited

Loading