Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Add post-PPD rewrite #5582

Merged
merged 2 commits into from
Nov 5, 2020
Merged

WIP Add post-PPD rewrite #5582

merged 2 commits into from
Nov 5, 2020

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Oct 17, 2020

No description provided.

@cla-bot cla-bot bot added the cla-signed label Oct 17, 2020
@kasiafi kasiafi force-pushed the 147SemiJoin branch 3 times, most recently from 4c12a45 to 5d8c2e4 Compare October 22, 2020 11:23
@kasiafi kasiafi force-pushed the 147SemiJoin branch 2 times, most recently from cb2664e to 7e04fb6 Compare October 29, 2020 16:37
if (exploreGroup(((GroupReference) child).getGroupId(), context)) {
Context childContext;
if (i == 0) {
// pass the context of Delete to the left branch of plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more elegant way would be to add boolean forDelete to TableScanNode. Then we could simply use PlanNodeSearcher within a rule to search for TableScans with such flag.

In current approach delete logic (context goes to left) becomes part of IterativeOptimizers itself

@@ -38,6 +40,11 @@
public class TestDereferencePushDown
extends BasePlanTest
{
/* public TestDereferencePushDown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

assertPlan(query, anyTree(
assertPlan(
query,
Session.builder(getQueryRunner().getDefaultSession())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care that it's not rewritten? In other tests it seem to make sense, but in this one not neccecerly.

@@ -1555,4 +1557,11 @@ private Session automaticJoinDistribution()
.setSystemProperty(JOIN_DISTRIBUTION_TYPE, JoinDistributionType.AUTOMATIC.name())
.build();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test case for semi join rewrite here.

@kasiafi
Copy link
Member Author

kasiafi commented Oct 30, 2020

Applied comments.
@sopel39, according to your suggestion, delete context is now property of TableScanNode instead of being passed from DeleteNode.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % small comments

@kasiafi kasiafi force-pushed the 147SemiJoin branch 3 times, most recently from 64b3bc8 to c8a9b52 Compare November 4, 2020 07:32
@sopel39
Copy link
Member

sopel39 commented Nov 5, 2020

Benchmarks comparison-semi_part.pdf
Benchmarks comparison-semi_unpart.pdf

significant improvements (both for part and unpart) for: tpcds/q83, tpcds/q95, tpcds/q58, no regressions

public class TestTransformFilteringSemiJoinToInnerJoin
extends BaseRuleTest
{
@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test that the rule does not fire with forDelete TS?
Are there tests for DELETE query with IN (semi-join)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's testDelete() in AbstractTestDistributedQueries. It covers the case of SemiJoin under DeleteNode. It runs for Raptor.

@sopel39 sopel39 merged commit 97aadd1 into trinodb:master Nov 5, 2020
@sopel39
Copy link
Member

sopel39 commented Nov 5, 2020

merged, thanks!

@sopel39 sopel39 mentioned this pull request Nov 5, 2020
10 tasks
@martint martint added this to the 346 milestone Nov 5, 2020
@sjx782392329
Copy link
Contributor

What has this pr improved? I can't understand without relevant description
Improve performance of queries with uncorrelated IN clauses

@kasiafi
Copy link
Member Author

kasiafi commented Dec 18, 2020

@sjx782392329 uncorrelated IN clauses are planned as SemiJoinNode. This PR adds an optimizer rule which captures SemiJoinNode and transforms it into join. A JoinNode, unlike SemiJoinNode, can be further optimized (CBO) and eventually produce a better plan.
Here are benchmarks which show that this change significantly improved performance in certain cases: #5582 (comment)

@guiyanakuang
Copy link
Member

@kasiafi, I observe that the uncorrelated not in clauses are also planned for SemiJoinNode. This pr does not transform such a SemiJoinNode into a JoinNode. I wonder if eventually transforming the uncorrelated not in clause into a JoinNode(type = left) would provide the same benefit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants