Pushdown dereference expression in query plan#1435
Pushdown dereference expression in query plan#1435qqibrow wants to merge 2 commits intotrinodb:masterfrom
Conversation
|
|
@qqibrow it's unfortunately flaky. Retriggered. |
martint
left a comment
There was a problem hiding this comment.
Some initial comments. I'm looking at the PushDownDereferences class now.
There was a problem hiding this comment.
Why is this placed here? Ideally, it should be closer to (or part of) the IterativeOptimizer that calls all the other push/prune/merge/etc rules (around line 300 in this file).
There was a problem hiding this comment.
It should be at least below TransformUncorrelatedInPredicateSubqueryToSemiJoin, where SimiJoinNode is first generated.
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
We need to make sure it won't "undo" decisions made by InlineProjections. Otherwise, it's possible to end up with an infinite optimization loop. It may be necessary to adjust the heuristics in InlineProjections.
There was a problem hiding this comment.
Place closing ) in previous line.
There was a problem hiding this comment.
Capitalize SQL keywords (FROM, CROSS JOIN, etc)
There was a problem hiding this comment.
The intermediate SELECT * is unnecessary. Also, please format the query like this for better readability:
WITH t(msg) AS (VALUES ROW(CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE))))
SELECT b.msg.x
FROM t a, t b
WHERE a.msg.y = b.msg.ySimilar comment applies to all the queries below.
There was a problem hiding this comment.
Implement this test or remove it.
There was a problem hiding this comment.
It's good to have these tests, but it would be ideal to also add rule-specific tests. See io.prestosql.sql.planner.iterative.rule.TestXXX for some examples.
There was a problem hiding this comment.
Relying inheritance to reuse behavior makes the code harder to follow and reason about (is it just about code reuse? Is there shared state that's important? etc). It'd be better to model it via some form of delegation. E.g., The rule can take another object that performs the job of pushDownDereferences.
|
I can also help with the review of this one, once it's updated with @martint 's comments. |
|
@phd3 sure. thanks. Working on this. |
a4f0734 to
f921a6e
Compare
Co-authored-by: Zhenxiao Luo <luoz@uber.com>
f921a6e to
a58f08e
Compare
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/InlineProjections.java
Outdated
Show resolved
Hide resolved
| // When nested child and parent dereferences both exist, Pushdown rule will be trigger one more time | ||
| // and lead to runtime error. E.g. [msg.foo, msg.foo.bar] => [exp, exp.bar] (should stop here but | ||
| // since there are still dereferences, pushdown rule will trigger again) | ||
| if (dereferences.stream().anyMatch(exp -> baseExists(exp, dereferences))) { |
There was a problem hiding this comment.
if input variable expressions is [x.y.z + f(x.y) + a.b] then this code will not pushdown a.b, right?
I think there is an alternative approach to doing this.
Given the expressions, we only create required symbols. i.e. for [x.y.z + f(x.y) + a.b], we can create symbols for x.y and a.b.
We can return ImmutableMap.of() in cases where all bases of all dereference expressions are symbolreferences. In the example you provided, the optimizer will stop the second time around there since the base exp is a symbolreference.
There was a problem hiding this comment.
Yes. it will not pushdown a.b. I don't know how to stop iteration in this case so this is just a walkaround, not final solution.
We can return ImmutableMap.of() in cases where all bases of all dereference expressions are symbolreferences
How does this work? It will stop the first time for [f(x.y) + a.b] because x and a are both symbol references.
There was a problem hiding this comment.
For example, let's say the variable expressions is a single element list [x.y.z + f(x.y) + a.b] . dereferences would come out to be {x.y.z, x.y, a.b}. Now here's where we can change the implementation. We can return ImmutableMap.of() if ALL elements in this set of expressions dereferences are 1-level dereferences. i.e. the bases are symbolreferences. It doesn't happen the first time, since x.y (the base of x.y.z) is not a symbol reference. In this new implementation, we also need to make sure that we only create symbol references for "superset" references. i.e. in this example, we only create symbols for x.y and a.b.
There was a problem hiding this comment.
I think optimizing such pushdowns here will impact the effectiveness of all individual pushdown rules, since this method is being used everywhere.
| private static boolean validPushDown(DereferenceExpression dereference) | ||
| { | ||
| Expression base = dereference.getBase(); | ||
| return (base instanceof SymbolReference) || (base instanceof DereferenceExpression); |
There was a problem hiding this comment.
Can you elaborate on why in other cases the pushdown should not be considered valid?
There was a problem hiding this comment.
like CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)).x
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushDownDereferences.java
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| static class PushDownDereferenceThroughUnnest |
There was a problem hiding this comment.
should we also implement ExtractFromUnnest rule since this can have an optional predicate based on recent changes?
There was a problem hiding this comment.
It depends on when we run PushdownDereferences I guess. If it runs before filter is empty then that's fine.
|
I've a concern about the placement of For a query like the following, current PR produces a plan that has
This happens because If we go with the current placement of |
Yes. We need to be careful about making sure rules don't undo each other's work, as this will lead to an infinite loop once the rules are placed in the same IterativeOptimizer. In particular, we'll need to adjust PredicatePushdown to not push filters below a dereference-expression-only projection. Ideally, we should migrate Predicate pushdown to a Rule-based implementation instead of a Visitor, but that's an orthogonal issue. We should also make sure InlineProjections doesn't undo a pushdown of dereference expressions into a separate projection. |
Your optimizer
That's done in current work. Please review. I am concerned that current |
phd3
left a comment
There was a problem hiding this comment.
I think it will help to add tests that cover more complex expressions and multiple level of dereferences.
| // When nested child and parent dereferences both exist, Pushdown rule will be trigger one more time | ||
| // and lead to runtime error. E.g. [msg.foo, msg.foo.bar] => [exp, exp.bar] (should stop here but | ||
| // since there are still dereferences, pushdown rule will trigger again) | ||
| if (dereferences.stream().anyMatch(exp -> baseExists(exp, dereferences))) { |
There was a problem hiding this comment.
For example, let's say the variable expressions is a single element list [x.y.z + f(x.y) + a.b] . dereferences would come out to be {x.y.z, x.y, a.b}. Now here's where we can change the implementation. We can return ImmutableMap.of() if ALL elements in this set of expressions dereferences are 1-level dereferences. i.e. the bases are symbolreferences. It doesn't happen the first time, since x.y (the base of x.y.z) is not a symbol reference. In this new implementation, we also need to make sure that we only create symbol references for "superset" references. i.e. in this example, we only create symbols for x.y and a.b.
| leftNode, | ||
| rightNode, | ||
| joinNode.getCriteria(), | ||
| ImmutableList.<Symbol>builder().addAll(leftNode.getOutputSymbols()).addAll(rightNode.getOutputSymbols()).build(), |
There was a problem hiding this comment.
This will add symbols being referenced in filters and equijoin clauses too as output symbols. (reverse of pruning). I think we should just add joinNode.getOutputSymbols() and pushdownDereferences.values() here.
| // When nested child and parent dereferences both exist, Pushdown rule will be trigger one more time | ||
| // and lead to runtime error. E.g. [msg.foo, msg.foo.bar] => [exp, exp.bar] (should stop here but | ||
| // since there are still dereferences, pushdown rule will trigger again) | ||
| if (dereferences.stream().anyMatch(exp -> baseExists(exp, dereferences))) { |
There was a problem hiding this comment.
I think optimizing such pushdowns here will impact the effectiveness of all individual pushdown rules, since this method is being used everywhere.
|
There is an interesting case of dereference expressions buried inside lambda expressions, which I believe we're missing here. For example, the following test case: The |
|
superceded by #2672 |
it aims to do:
pushdownProjectionToTableScanquery:
[field_6:row(x bigint, y double), expr_19:bigint]but in factfield_6containsexpr_19.