-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations #28951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations #28951
Conversation
update masterbranch
|
Good catch. LGTM but I'll leave it for a bit of a SQL committer has any thoughts. |
|
ok to test |
| AttributeReference("c", IntegerType)(), | ||
| AttributeReference("d", TimestampType)()) | ||
|
|
||
| val a1 = firstTable.output(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuoPhilipse Variables a1, b1, c1, d1 not used ? Were you planning to use it in the test later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em,,,let me remove it
| val c1 = firstTable.output(2) | ||
| val d1 = firstTable.output(3) | ||
|
|
||
| val a2 = secondTable.output(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| val c2 = secondTable.output(2) | ||
| val d2 = secondTable.output(3) | ||
|
|
||
| val a3 = thirdTable.output(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| val c3 = thirdTable.output(2) | ||
| val d3 = thirdTable.output(3) | ||
|
|
||
| val a4 = fourthTable.output(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dilipbiswal for your review.
| case 1 => "second" | ||
| case i => s"${i}th" | ||
| case 2 => "third" | ||
| case i => s"${i + 1}th" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehehe, nice catch ;)
|
nit: |
| } | ||
| } | ||
|
|
||
| test("SPARK-32131 Fix wrong column index when we have more than two columns" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SPARK-32131 -> SPARK-32131:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks maropu ,btw, I did not see the test build start to build, do you know why?
have updated in the PR, it is a normal case ,it is used for comparing between the normal and abnormal cases. |
|
LGTM |
|
retest this please. |
|
It seesm i cannot trigger the test build, @dilipbiswal @maropu @HyukjinKwon @holdenk ,do you have any ideas? |
|
Jenkins ok to test |
|
Jenkins retest this please |
|
ok to test |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
…US operations ### What changes were proposed in this pull request? fix error exception messages during exceptions on Union and set operations ### Why are the changes needed? Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index. Steps to reproduce: ``` drop table if exists test1; drop table if exists test2; drop table if exists test3; create table if not exists test1(id int, age int, name timestamp); create table if not exists test2(id int, age timestamp, name timestamp); create table if not exists test3(id int, age int, name int); insert into test1 select 1,2,'2020-01-01 01:01:01'; insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; insert into test3 select 1,3,4; ``` Query1: ```sql select * from test1 except select * from test2; ``` Result1: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0) ``` Query2: ```sql select * from test1 except select * from test3; ``` Result2: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types int <> timestamp at the 2th column of the second table; ``` the above query1 has the right exception message the above query2 have the wrong errors information, it may need to change to the following ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the third column of the second table ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? unit test Closes #28951 from GuoPhilipse/32131-correct-error-messages. Lead-authored-by: GuoPhilipse <[email protected]> Co-authored-by: GuoPhilipse <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 02f3b80) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Test build #124704 has finished for PR 28951 at commit
|
…US operations fix error exception messages during exceptions on Union and set operations Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index. Steps to reproduce: ``` drop table if exists test1; drop table if exists test2; drop table if exists test3; create table if not exists test1(id int, age int, name timestamp); create table if not exists test2(id int, age timestamp, name timestamp); create table if not exists test3(id int, age int, name int); insert into test1 select 1,2,'2020-01-01 01:01:01'; insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; insert into test3 select 1,3,4; ``` Query1: ```sql select * from test1 except select * from test2; ``` Result1: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0) ``` Query2: ```sql select * from test1 except select * from test3; ``` Result2: ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types int <> timestamp at the 2th column of the second table; ``` the above query1 has the right exception message the above query2 have the wrong errors information, it may need to change to the following ``` Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the third column of the second table ``` NO unit test Closes #28951 from GuoPhilipse/32131-correct-error-messages. Lead-authored-by: GuoPhilipse <[email protected]> Co-authored-by: GuoPhilipse <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 02f3b80) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Thank you all! This lands at |
What changes were proposed in this pull request?
fix error exception messages during exceptions on Union and set operations
Why are the changes needed?
Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index.
Steps to reproduce:
Query1:
Result1:
Query2:
Result2:
the above query1 has the right exception message
the above query2 have the wrong errors information, it may need to change to the following
Does this PR introduce any user-facing change?
NO
How was this patch tested?
unit test