-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column #30380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131109 has finished for PR 30380 at commit
|
| test("getPartitionsByFilter: chunk in ('ab', 'ba') and ((cast(ds as string)>'20170102')") { | ||
| val day = (20170101 to 20170103, 0 to 4, Seq("ab", "ba")) | ||
| testMetastorePartitionFiltering( | ||
| attr("chunk").in("ab", "ba") && (attr("ds").cast(StringType) > "20170102"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for 20170102.1234?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same because we didn't pruning it:
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
index 7e10d49..6b976d9 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.catalog._
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.types.{BooleanType, IntegerType, LongType, StructType}
+import org.apache.spark.sql.types.{BooleanType, IntegerType, LongType, StringType, StructType}
import org.apache.spark.util.Utils
class HivePartitionFilteringSuite(version: String)
@@ -272,6 +272,15 @@ class HivePartitionFilteringSuite(version: String)
day1 :: day2 :: Nil)
}
+
+ test("getPartitionsByFilter: chunk in ('ab', 'ba') and " +
+ "((cast(ds as string)='20170101') or (cast(ds as string)='20170102'))") {
+ val day = (20170101 to 20170103, 0 to 4, Seq("ab", "ba"))
+ testMetastorePartitionFiltering(attr("chunk").in("ab", "ba") &&
+ (attr("ds").cast(StringType) > "20170102.1234"),
+ day :: Nil)
+ }
+|
cc @cloud-fan |
|
BTW, is this a subset of the existing PR from @bersprockets ? cc @sunchao |
| def unapply(expr: Expression): Option[Attribute] = { | ||
| expr match { | ||
| case attr: Attribute => Some(attr) | ||
| case Cast(IntegralType(), StringType, _) => None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I'm thinking if we should be more conservative here. How about
case Cast(child @ IntegralType(), dt: IntegralType, _) => if Cast.canUpCast...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
# Conflicts: # sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131164 has finished for PR 30380 at commit
|
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. How far do we need to backport?
|
I think we need to backport to branch-2.4. |
…g.String when pruning partition column
### What changes were proposed in this pull request?
This pr fix filter for int column and value class java.lang.String when pruning partition column.
How to reproduce this issue:
```scala
spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET")
spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
```
```
20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test
20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String
20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0']
java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743)
```
### Why are the changes needed?
Fix bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30380 from wangyum/SPARK-27421.
Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Yuming Wang <[email protected]>
(cherry picked from commit 014e1fb)
Signed-off-by: Yuming Wang <[email protected]>
|
@wangyum, can you resolve the JIRA, and comment here which branchs you merged into? |
|
Merged to master and branch-3.0. |
…a.lang.String when pruning partition column This pr backport #30380 to branch-2.4. ### What changes were proposed in this pull request? This pr fix filter for int column and value class java.lang.String when pruning partition column. How to reproduce this issue: ```scala spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET") spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test") spark.sql("SELECT * FROM test_view WHERE id = '0'").explain ``` ``` 20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test 20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String 20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0'] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743) ``` ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #30422 from wangyum/SPARK-27421-2.4. Authored-by: Yuming Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…hen pruning partition column refer to apache#30380
What changes were proposed in this pull request?
This pr fix filter for int column and value class java.lang.String when pruning partition column.
How to reproduce this issue:
Why are the changes needed?
Fix bug.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.