[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType#9720
[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType#9720kevinyu98 wants to merge 3 commits intoapache:masterfrom
Conversation
|
ok to test |
There was a problem hiding this comment.
The goal here was to mimic hive's type coercion rules. I think if you create a compatibility test like SELECT "0001" = 1 this rule is required (if its not then we could consider changing this).
There was a problem hiding this comment.
hive do support SELECT "0001" = 1, however, I think this rule is too simple, how about using findTightestCommonTypeToString?
There was a problem hiding this comment.
I think so, this rule will fire first and change the type to DoubleType.
btw I think it's a bad smell to have conflict rules, we should improve it and make sure it only handles cases that missed by ImplicitTypeCasts.
There was a problem hiding this comment.
@cloud-fan : do you want me to open a new jira to look into this? The new jira/pr will focus on the rules in PromoteStrings and ImplicitTypeCasts, as you suggested to reduce the redundant rules in PromoteStrings.
There was a problem hiding this comment.
@kevinyu98 I do not think that is really a problem for now. I think we do not need a jira for that right now.
There was a problem hiding this comment.
@kevinyu98 please hold off until you find something is broken by this and we have to fix it.
There was a problem hiding this comment.
@yhuai @cloud-fan : sure, I will not do that. I will try to run more testing to see if anything is broken.
|
Test build #45967 has finished for PR 9720 at commit
|
|
Test build #45974 has finished for PR 9720 at commit
|
|
LGTM, thanks for working on this @kevinyu98 |
|
@cloud-fan and @marmbrus @yhuai @nongli @liancheng : thanks for reviewing the fix. |
|
Merging to master and branch 1.6. |
…son between NullType and StringType
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+
Author: Kevin Yu <qyu@us.ibm.com>
Closes #9720 from kevinyu98/working_on_spark-11447.
(cherry picked from commit e01865a)
Signed-off-by: Yin Huai <yhuai@databricks.com>
…son between NullType and StringType
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+
Author: Kevin Yu <qyu@us.ibm.com>
Closes apache#9720 from kevinyu98/working_on_spark-11447.
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+