[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions#14425
[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions#14425ericl wants to merge 2 commits intoapache:masterfrom
Conversation
| def getPlan(df: DataFrame): SparkPlan = { | ||
| df.queryExecution.executedPlan | ||
| } | ||
| assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2")))) |
There was a problem hiding this comment.
did you verify this would fail without your patch?
|
LGTM (assuming the test case would fail without the fix) |
|
Yep, both fail prior to the fix. On Sat, Jul 30, 2016, 3:32 PM Reynold Xin [email protected] wrote:
|
|
Test build #63047 has finished for PR 14425 at commit
|
|
Merging in master/2.0. |
|
@ericl there is a conflict with branch-2.0. Can you create a pull request for branch-2.0? |
…sets of partitions This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation. The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying. cc rxin Unit tests. Author: Eric Liang <[email protected]> Closes apache#14425 from ericl/spark-16818. Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
|
Done, see #14427 |
…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <[email protected]> Closes #14427 from ericl/spark-16818-br-2.
What changes were proposed in this pull request?
This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of
sameResult(). As a result, executions may be incorrect on self-joins over the same base file relation.The patch here is minimal, but we should reconsider relying on
metadatafor implementing sameResult() in the future, as string representations may not be uniquely identifying.cc @rxin
How was this patch tested?
Unit tests.