Skip to content

Conversation

@GuoPhilipse
Copy link
Member

@GuoPhilipse GuoPhilipse commented Jun 29, 2020

What changes were proposed in this pull request?

fix error exception messages during exceptions on Union and set operations

Why are the changes needed?

Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index.

Steps to reproduce:

drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;

Query1:

select * from test1 except select * from test2;

Result1:

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0)

Query2:

select * from test1 except select * from test3;

Result2:

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types
 int <> timestamp at the 2th column of the second table;

the above query1 has the right exception message
the above query2 have the wrong errors information, it may need to change to the following

Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types.
int <> timestamp at the  third column of the second table

Does this PR introduce any user-facing change?

NO

How was this patch tested?

unit test

@holdenk
Copy link
Contributor

holdenk commented Jun 29, 2020

Good catch. LGTM but I'll leave it for a bit of a SQL committer has any thoughts.

@HyukjinKwon
Copy link
Member

ok to test

AttributeReference("c", IntegerType)(),
AttributeReference("d", TimestampType)())

val a1 = firstTable.output(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuoPhilipse Variables a1, b1, c1, d1 not used ? Were you planning to use it in the test later ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em,,,let me remove it

val c1 = firstTable.output(2)
val d1 = firstTable.output(3)

val a2 = secondTable.output(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val c2 = secondTable.output(2)
val d2 = secondTable.output(3)

val a3 = thirdTable.output(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val c3 = thirdTable.output(2)
val d3 = thirdTable.output(3)

val a4 = fourthTable.output(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dilipbiswal for your review.

case 1 => "second"
case i => s"${i}th"
case 2 => "third"
case i => s"${i + 1}th"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehehe, nice catch ;)

@maropu
Copy link
Member

maropu commented Jun 30, 2020

nit: test2 not used in the PR description?

}
}

test("SPARK-32131 Fix wrong column index when we have more than two columns" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SPARK-32131 -> SPARK-32131:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks maropu ,btw, I did not see the test build start to build, do you know why?

@GuoPhilipse
Copy link
Member Author

nit: test2 not used in the PR description?

have updated in the PR, it is a normal case ,it is used for comparing between the normal and abnormal cases.

@dilipbiswal
Copy link
Contributor

LGTM

@GuoPhilipse
Copy link
Member Author

retest this please.

@GuoPhilipse
Copy link
Member Author

It seesm i cannot trigger the test build, @dilipbiswal @maropu @HyukjinKwon @holdenk ,do you have any ideas?

@holdenk
Copy link
Contributor

holdenk commented Jun 30, 2020

Jenkins ok to test

@holdenk
Copy link
Contributor

holdenk commented Jun 30, 2020

Jenkins retest this please

@maropu
Copy link
Member

maropu commented Jun 30, 2020

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32131][SQL] union and set operations have wrong exception infomation [SPARK-32131][SQL] Fix AnalysisException messages at UNION/SET operations Jul 1, 2020
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32131][SQL] Fix AnalysisException messages at UNION/SET operations [SPARK-32131][SQL] Fix AnalysisException messages at UNION/EXCEPT/MINUS operations Jul 1, 2020
dongjoon-hyun pushed a commit that referenced this pull request Jul 1, 2020
…US operations

### What changes were proposed in this pull request?
fix error exception messages during exceptions on Union and set operations

### Why are the changes needed?
Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index.

Steps to reproduce:

```
drop table if exists test1;
drop table if exists test2;
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01';
insert into test3 select 1,3,4;
```

Query1:
```sql
select * from test1 except select * from test2;
```
Result1:
```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0)
```

Query2:

```sql
select * from test1 except select * from test3;
```

Result2:

```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types
 int <> timestamp at the 2th column of the second table;
```

the above query1 has the right exception message
the above query2 have the wrong errors information, it may need to change to the following

```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types.
int <> timestamp at the  third column of the second table
```

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
unit test

Closes #28951 from GuoPhilipse/32131-correct-error-messages.

Lead-authored-by: GuoPhilipse <[email protected]>
Co-authored-by: GuoPhilipse <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 02f3b80)
Signed-off-by: Dongjoon Hyun <[email protected]>
@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124704 has finished for PR 28951 at commit 984c652.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Jul 1, 2020
…US operations

fix error exception messages during exceptions on Union and set operations

Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the exception messages will have wrong column index.

Steps to reproduce:

```
drop table if exists test1;
drop table if exists test2;
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01';
insert into test3 select 1,3,4;
```

Query1:
```sql
select * from test1 except select * from test2;
```
Result1:
```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0)
```

Query2:

```sql
select * from test1 except select * from test3;
```

Result2:

```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types
 int <> timestamp at the 2th column of the second table;
```

the above query1 has the right exception message
the above query2 have the wrong errors information, it may need to change to the following

```
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types.
int <> timestamp at the  third column of the second table
```

NO

unit test

Closes #28951 from GuoPhilipse/32131-correct-error-messages.

Lead-authored-by: GuoPhilipse <[email protected]>
Co-authored-by: GuoPhilipse <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 02f3b80)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you all! This lands at master/3.0/2.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants