Skip to content

Conversation

@henryr
Copy link
Contributor

@henryr henryr commented Nov 3, 2017

What changes were proposed in this pull request?

It's not safe in all cases to push down a LIMIT below a FULL OUTER
JOIN. If the limit is pushed to one side of the FOJ, the physical
join operator can not tell if a row in the non-limited side would have a
match in the other side.

If the join operator guarantees that unmatched tuples from the limited
side are emitted before any unmatched tuples from the other side,
pushing down the limit is safe. But this is impractical for some join
implementations, e.g. SortMergeJoin.

For now, disable limit pushdown through a FULL OUTER JOIN, and we can
evaluate whether a more complicated solution is necessary in the future.

How was this patch tested?

Ran org.apache.spark.sql.* tests. Altered full outer join tests in
LimitPushdownSuite.

## What changes were proposed in this pull request?

It's not safe in all cases to push down a LIMIT below a FULL OUTER
JOIN. If the limit is pushed to one side of the FOJ, the physical
join operator can not tell if a row in the non-limited side would have a
match in the other side.

*If* the join operator guarantees that unmatched tuples from the limited
side are emitted before any unmatched tuples from the other side,
pushing down the limit is safe. But this is impractical for some join
implementations, e.g. SortMergeJoin.

For now, disable limit pushdown through a FULL OUTER JOIN, and we can
evaluate whether a more complicated solution is necessary in the future.

## How was this patch tested?

Ran org.apache.spark.sql.* tests. Altered full outer join tests in
LimitPushdownSuite.
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
case FullOuter =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it! We should still keep it.

Let me fix it based on my original PR: #10454

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decide to disable it and merge it to master and the previous release

@gatorsmile
Copy link
Member

retest this please

1 similar comment
@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 3, 2017

Test build #83417 has started for PR 19647 at commit 0e0123b.

@henryr
Copy link
Contributor Author

henryr commented Nov 4, 2017

retest this please

1 similar comment
@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2017

Test build #83432 has finished for PR 19647 at commit 0e0123b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented Nov 5, 2017

Test build #83454 has finished for PR 19647 at commit 0e0123b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 6c66266 Nov 5, 2017
asfgit pushed a commit that referenced this pull request Nov 5, 2017
It's not safe in all cases to push down a LIMIT below a FULL OUTER
JOIN. If the limit is pushed to one side of the FOJ, the physical
join operator can not tell if a row in the non-limited side would have a
match in the other side.

*If* the join operator guarantees that unmatched tuples from the limited
side are emitted before any unmatched tuples from the other side,
pushing down the limit is safe. But this is impractical for some join
implementations, e.g. SortMergeJoin.

For now, disable limit pushdown through a FULL OUTER JOIN, and we can
evaluate whether a more complicated solution is necessary in the future.

Ran org.apache.spark.sql.* tests. Altered full outer join tests in
LimitPushdownSuite.

Author: Henry Robinson <henry@cloudera.com>

Closes #19647 from henryr/spark-22211.

(cherry picked from commit 6c66266)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@gatorsmile
Copy link
Member

Thanks! Merged to master/2.2

comparePlans(optimized, correctAnswer)
}

test("full outer join where neither side is limited and both sides have same statistics") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @gatorsmile and @henryr .
This breaks branch-2.2.

[info] - full outer join where neither side is limited and both sides have same statistics *** FAILED *** (43 milliseconds)
[info] - full outer join where neither side is limited and left side has larger statistics *** FAILED *** (12 milliseconds)
[info] - full outer join where neither side is limited and right side has larger statistics *** FAILED *** (9 milliseconds)

cc @felixcheung for 2.2.1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you submit a fix to resolve the test failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm taking a look now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, all!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19701. I think something got dropped in the backport to branch-2.2, so thankfully it's an easy fix.

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
It's not safe in all cases to push down a LIMIT below a FULL OUTER
JOIN. If the limit is pushed to one side of the FOJ, the physical
join operator can not tell if a row in the non-limited side would have a
match in the other side.

*If* the join operator guarantees that unmatched tuples from the limited
side are emitted before any unmatched tuples from the other side,
pushing down the limit is safe. But this is impractical for some join
implementations, e.g. SortMergeJoin.

For now, disable limit pushdown through a FULL OUTER JOIN, and we can
evaluate whether a more complicated solution is necessary in the future.

Ran org.apache.spark.sql.* tests. Altered full outer join tests in
LimitPushdownSuite.

Author: Henry Robinson <henry@cloudera.com>

Closes apache#19647 from henryr/spark-22211.

(cherry picked from commit 6c66266)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants