Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented Nov 12, 2019

What changes were proposed in this pull request?

There is an issue for InSubquery expression.
For example, there are two tables ta and tb created by the below statements.

 sql("create table ta(id Decimal(18,0)) using parquet")
 sql("create table tb(id Decimal(19,0)) using parquet")

This statement below would thrown dataType mismatch exception.

 sql("select * from ta where id in (select id from tb)").show()

However, this similar statement could execute successfully.

 sql("select * from ta where id in ((select id from tb))").show()

The root cause is that, for InSubquery expression, it does not find a common type for two decimalType like In expression.
Besides that, for InSubquery expression, it also does not find a common type for DecimalType and double/float/bigInt.
In this PR, I fix this issue by finding widerType for InSubquery expression when DecimalType is involved.

Why are the changes needed?

Some InSubquery would throw dataType mismatch exception.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@maropu
Copy link
Member

maropu commented Nov 13, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113659 has finished for PR 26485 at commit d39b092.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113665 has finished for PR 26485 at commit cea1b78.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@amanomer
Copy link
Contributor

@maropu Kindly review this PR #26317

@turboFei turboFei force-pushed the SPARK-29860-in-subquery branch from cea1b78 to 312621a Compare November 13, 2019 07:46
@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113678 has finished for PR 26485 at commit 312621a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei
Copy link
Member Author

cc @wangyum

@turboFei turboFei force-pushed the SPARK-29860-in-subquery branch from 48ec228 to bb68aac Compare November 13, 2019 08:41
@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113689 has finished for PR 26485 at commit 48ec228.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113691 has finished for PR 26485 at commit bb68aac.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 13, 2019

Test build #113699 has finished for PR 26485 at commit f9fbdf6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Some(widerType)
} else {
None
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks suspicious... I personally think this issue should be fixed only in InConversion instead of findTightestCommonType. That's because the change of findTightestCommonType can affect type coercion in the other operations... cc: @mgaido91 @cloud-fan

Copy link
Member Author

@turboFei turboFei Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for two decimalType, such as Decimal(3,0) and Decimal(3,2), their tightest common type should be Decimal(5,2), which is consistent with the method name findTightestCommonType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to the specific bug? If not let's open another PR to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more reasonable to fix InConversion. I think it's wrong that In and InSubquery have different type coercion logic.

Copy link
Member Author

@turboFei turboFei Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan I will make a deep check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually InConversion and BinaryComparison also have different type coercion logic: #22038

Copy link
Member Author

@turboFei turboFei Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a similar implementation:

case (t1: DecimalType, t2: DecimalType) =>
val scale = math.max(t1.scale, t2.scale)
val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
if (range + scale > 38) {
// DecimalType can't support precision > 38
DoubleType
} else {
DecimalType(range + scale, scale)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's wrong that In and InSubquery have different type coercion logic.

I agree on this. Please see #19635, where I tried to fix this....

@maropu
Copy link
Member

maropu commented Nov 14, 2019

I think the PR description above should be self-descriptive, so could you please make that more clearer? What's a root cause of this issue, how to fix, brabrabra...

@turboFei
Copy link
Member Author

I think the PR description above should be self-descriptive, so could you please make that more clearer? What's a root cause of this issue, how to fix, brabrabra...

Thanks, updated.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113745 has finished for PR 26485 at commit 1a67c8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei turboFei force-pushed the SPARK-29860-in-subquery branch from 0c3109e to c5857d1 Compare November 14, 2019 12:27
@turboFei
Copy link
Member Author

turboFei commented Nov 14, 2019

I have rebased this PR and modified the logic of InSubquery in InConversion.
And I think the PR #22038 created by @wangyum would unify the logic of In and InSubquery.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113782 has finished for PR 26485 at commit c5857d1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113788 has finished for PR 26485 at commit 9f3fdd0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113780 has finished for PR 26485 at commit 0c3109e.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113791 has finished for PR 26485 at commit 6120d50.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113794 has finished for PR 26485 at commit 4ac52bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei turboFei force-pushed the SPARK-29860-in-subquery branch from 4ac52bd to e7d7b61 Compare November 15, 2019 12:17
@SparkQA
Copy link

SparkQA commented Nov 15, 2019

Test build #113879 has finished for PR 26485 at commit e7d7b61.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei
Copy link
Member Author

retest this please.

@maropu
Copy link
Member

maropu commented Nov 15, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2019

Test build #113899 has finished for PR 26485 at commit e7d7b61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

cc @liancheng as well

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114820 has finished for PR 26485 at commit e7d7b61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- !query 9 output
org.apache.spark.sql.AnalysisException
cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', t4.`t4c`) IN (listquery()))' due to data type mismatch:
cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', t4.`t4c`) IN (listquery()))' due to data type mismatch:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I do not know why a space is involved, I have tried to remove it, but failed. It should does not matter.

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114835 has finished for PR 26485 at commit a89b3b4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM. Can we check the behavior in other databases like pgsql? It's better to know if Spark follows SQL standard or not.

@turboFei
Copy link
Member Author

turboFei commented Dec 4, 2019

Will check the behavior later.

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114847 has finished for PR 26485 at commit 05ffba1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei turboFei force-pushed the SPARK-29860-in-subquery branch from 05ffba1 to c8f93d8 Compare December 4, 2019 13:46
@turboFei
Copy link
Member Author

turboFei commented Dec 4, 2019

It seems that pgsql does not support decimal in string.
image

@cloud-fan
Copy link
Contributor

How about the other way around (string in decimal)? Anyway this is already the behavior of In, we should fix them together later for ANSI-SQL compliant.

@turboFei
Copy link
Member Author

turboFei commented Dec 4, 2019

The result is similar: ERROR: operator does not exist: text = numeric.
I agree that we can fix them later for ANSI-SQL compliant.

@SparkQA
Copy link

SparkQA commented Dec 4, 2019

Test build #114862 has finished for PR 26485 at commit c8f93d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 0ab922c Dec 5, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Dec 6, 2019
### What changes were proposed in this pull request?
There is an issue for InSubquery expression.
For example, there are two tables `ta` and `tb` created by the below statements.
```
 sql("create table ta(id Decimal(18,0)) using parquet")
 sql("create table tb(id Decimal(19,0)) using parquet")
```
This statement below would thrown dataType mismatch exception.

```
 sql("select * from ta where id in (select id from tb)").show()
```
However, this similar statement could execute successfully.

```
 sql("select * from ta where id in ((select id from tb))").show()
```
The root cause is that, for `InSubquery` expression, it does not find a common type for two decimalType like `In` expression.
Besides that, for `InSubquery` expression, it also does not find a common type for DecimalType and double/float/bigInt.
In this PR, I fix this issue by finding widerType for `InSubquery` expression when DecimalType is involved.

### Why are the changes needed?
Some InSubquery would throw dataType mismatch exception.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Unit test.

Closes apache#26485 from turboFei/SPARK-29860-in-subquery.

Authored-by: turbofei <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants