[SQL][SPARK-39528] Use V2 Filter in SupportsRuntimeFiltering#36918
[SQL][SPARK-39528] Use V2 Filter in SupportsRuntimeFiltering#36918huaxingao wants to merge 10 commits intoapache:masterfrom
Conversation
|
cc @cloud-fan Could you please take a look when you have a moment? Thanks! |
| scan match { | ||
| case _: SupportsRuntimeFiltering => | ||
| DataSourceStrategy.translateRuntimeFilter(e) | ||
| case _: SupportsRuntimeV2Filtering => |
There was a problem hiding this comment.
shall we make SupportsRuntimeV2Filtering have higher priority over SupportsRuntimeFiltering? Also we need to document the behavior if a source implements both of them
There was a problem hiding this comment.
It doesn't seem to me that a data source would implement both SupportsRuntimeV2Filtering and SupportsRuntimeFiltering?
| } | ||
| val literals = values.map { value => | ||
| val literal = Literal(value) | ||
| LiteralValue(literal.value, literal.dataType) |
There was a problem hiding this comment.
We don't need to infer the data type by creating a catalyst Literal. The type must be in.child.dataType
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
Show resolved
Hide resolved
| if (partitioning.length == 1 && partitioning.head.references().length == 1) { | ||
| val ref = partitioning.head.references().head | ||
| filters.foreach { | ||
| case p : Predicate if p.name().equals("IN") => |
There was a problem hiding this comment.
feels like some unapply method to extract what you want is more preferable
There was a problem hiding this comment.
Predicate is a java class. I don't think unapply can be used
|
The test failure is unrelated. |
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsRuntimeFiltering.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsRuntimeFiltering.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/util/PredicateUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/util/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/util/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
Outdated
Show resolved
Hide resolved
|
The GA failure is unrelated. Merging to master, thanks! |
|
Thanks @cloud-fan @zinking |
| with EnableAdaptiveExecutionSuite | ||
|
|
||
| abstract class DynamicPartitionPruningV2FilterSuite | ||
| extends DynamicPartitionPruningDataSourceSuiteBase { |
There was a problem hiding this comment.
shall we extend DynamicPartitionPruningV2Suite here? then we can save the override protected def runAnalyzeColumnCommands: Boolean = false, and catalog configs will be overwritten.
|
Hi @huaxingao. We are trying to use spark datasourceV2 and noticed that the spark v2 built-in data sources (eg parquet one, looking at Is there a plan to have them support this? It would be really beneficial for the file scans to be able to do this and given they already benefit of some push downs we were wondering why the runtime filtering is not implemented. Or maybe I am missing something? And in that case it would be great to understand how to have spark file sources take advantage of dpp. Thanks! |
What changes were proposed in this pull request?
Use V2 Filter in run time filtering for V2 Table
Why are the changes needed?
We should use V2 Filter in DS V2.
#32921 (comment)
Does this PR introduce any user-facing change?
Yes
new interface
SupportsRuntimeV2FilteringHow was this patch tested?
new test suite