Dereference Projection Pushdown in Query Plan#2672
Conversation
|
(1) @martint made a suggestion that the dereference rules should go into the iterative optimizer where most of the rules are. Since dereference pushdown is also projection pushdown, I moved the rules to be a part of The root cause for failures is: After moving dereference pushdown to projectionPushDownRules, predicate pushdown does not follow dereference pushdown. For a query like
(2) Consider the following input for Input: If P1 is using dereference expression on a symbol that is output from N (and not synthesized within P2), then the dereference expression should get pushed down (3) (4) |
45f1aad to
30a9a1c
Compare
|
For 1), it's certainly possible that plans will be affected, but adding instances of PredicatePushdown should be benign. For 2) can you clarify what the actual shape of the plan is? The way it's written would seem to indicate that's P1(P2(N)), but then the symbol references are broken. For 3) yes, that's correct. For 4), that seems reasonable. Make sure to add it to the same IterativeOptimizer, which will take care of running them if necessary. |
|
@martint thanks. I've fixed the mixup in the example (2). please let me know if it's still confusing. EDIT: just saw that |
|
Thanks. That makes sense now. Unfortunately, UnaliasSymbolReferences is not implemented as a Rule (it's tricky to do so -- my plan is to eventually get rid of it anyway). But we might want to extend PushDereferenceThroughProject to be able to map identity renaming projections. That should handle this scenario without require the Unalias optimizer to run in between. |
|
Significant changes:
|
|
With #1720 and #2672 applied on top of 331, I get the exception below when running a query like this: Any ideas, @phd3 ? |
|
I've tried to find the simplest query where this is occurring and I found this one: If the number of partitions is less, the error doesn't occur. |
|
Here's a simplified version of the original query that repros the issue: It's not reproducible without the IS NOT NULL check. It's also not data related - I can reproduce it with any partition. |
4df46c7 to
1fefbb8
Compare
There was a problem hiding this comment.
Add a comment explaining this is here for that reason, so that when we revisit things in the future we know why we added it
...ain/src/main/java/io/prestosql/sql/planner/iterative/rule/PushProjectionThroughExchange.java
Outdated
Show resolved
Hide resolved
...prestosql/sql/planner/iterative/rule/dereference/ExtractDereferencesFromFilterAboveScan.java
Outdated
Show resolved
Hide resolved
...prestosql/sql/planner/iterative/rule/dereference/ExtractDereferencesFromFilterAboveScan.java
Outdated
Show resolved
Hide resolved
...ain/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThrough.java
Outdated
Show resolved
Hide resolved
...va/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThroughFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThrough.java
Outdated
Show resolved
Hide resolved
...va/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThroughFilter.java
Outdated
Show resolved
Hide resolved
...ain/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThrough.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Missing call to RemoveUnsupportedDynamicFilters here is causing this query to be 4x slower:
new RemoveUnsupportedDynamicFilters(metadata),
There was a problem hiding this comment.
Upon closer examination, performance is the same without the RemoveUnsupportedDynamicFilters. Sorry for the noise.
1fefbb8 to
0224dfb
Compare
…namicFilters call
d1d0e2f to
7d8e3f0
Compare
|
I think the following optimizers should be invoked (in order) for complete pushdown: (1) PushProjectionThroughX (plan projection pushdown) (1) and (2) are included in projectionPushdown optimizer. I'm confused about this: should (3) and (4) be invoked every time (1) and (2) are invoked? In the first pass of this PR, I added (3), but not (4) since the Hive pushdown wasn't implemented yet. After adding I've copied tests from |
|
If there's a constraint in the filter node that the connector can "enforce", In case of non-partition pushdown for hive, this is not true. The "compactEffectivePredicate" added to the table handle may or may not be satisfied by the HivePageSource (eg. ORC vs Avro). So we cannot get rid of those predicates from the FilterNode. But even though Predicates on top level columns still get pushed down to ORC/Parquet, because For this reason, I added connector table handle comparison in arePlansSame. (That failed some tests initially due to the fact that BucketFilter equality was object reference based, now they pass). But I'm not sure if comparing connector table handles is okay, or we need a better approach to tell that there was "some level" of predicate pushdown, even if it's not guaranteed to be enforced. What do you think? I can move this to a separate PR if you think that'd be better. Edit: Extracted this to #3470 |
67388b3 to
4362b40
Compare
e03c41a to
5259429
Compare
6258d4b to
3c54f93
Compare
There was a problem hiding this comment.
Add a comment explaining this is here for that reason, so that when we revisit things in the future we know why we added it
...prestosql/sql/planner/iterative/rule/dereference/ExtractDereferencesFromFilterAboveScan.java
Outdated
Show resolved
Hide resolved
.../main/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferencesUtil.java
Outdated
Show resolved
Hide resolved
.../main/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferencesUtil.java
Outdated
Show resolved
Hide resolved
.../main/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferencesUtil.java
Outdated
Show resolved
Hide resolved
.../main/java/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferencesUtil.java
Outdated
Show resolved
Hide resolved
...a/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThroughProject.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/TestDereferencePushDown.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
We can't do this generically. The meaning of how the outputs of a node map to its inputs is node-specific, and we can't make inferences based on the fact that symbols are named the same way. It is legal for symbols to be named the same as long as they have the same type, even if they don't represent the same concept.
Also, for nodes such as Union, there's an internal remapping of inputs to outputs that changes the symbol names associated with they columns, so this would fail to identify them.
Additionally, this won't work for nodes such as Apply or SemiJoin. "Source" generally just means "an input that the operation consumes to achieve its result". In the case of Apply or Semijoin, the right side can be thought of as a subquery that gets applied for every row on the left side. In the case of Apply, there's some internal reduction operation that's done on top of the results of the subquery (semantically speaking). All this is to say that we can't generically map the dereferences in the project to the "sources" of a node -- it has to be done case-by-case.
There was a problem hiding this comment.
Thanks @martint, that makes sense. This has been incorporated now by adding separate rules for every node, rather than trying to overfit a generic pattern. I think this suggestion has also helped with code readability and reasoning.
All the rules now follow the following algorithm at a high level:
ProjectNode P
Node N
sources S
-
Extract dereference projections (and create new symbols) from E1 U E2 where (1) E1: projection assignment expressions and (2) E2: set of expressions being used in the node N. (eg predicate in FilterNode or function expressions in WindowNode). The logic for extraction is DereferencePushdown::validDereferences.
-
Exclude those dereference expressions, for which, the base is used as-is in the node N itself. Say the remaining dereferences are E'.
-
Pushdown E' by creating project nodes between N and S. Rewrite assignments in P and expressions in N to replace dereference expressions with new symbols.
b35bcf5 to
86e5d86
Compare
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
...prestosql/sql/planner/iterative/rule/dereference/ExtractDereferencesFromFilterAboveScan.java
Outdated
Show resolved
Hide resolved
...va/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThroughFilter.java
Outdated
Show resolved
Hide resolved
...a/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferenceThroughProject.java
Outdated
Show resolved
Hide resolved
...va/io/prestosql/sql/planner/iterative/rule/dereference/PushDownDereferencesThroughLimit.java
Outdated
Show resolved
Hide resolved
...in/src/test/java/io/prestosql/sql/planner/iterative/rule/TestPushProjectionThroughUnion.java
Outdated
Show resolved
Hide resolved
86e5d86 to
b163819
Compare
|
PushLimitThroughProject and PushDereferenceThroughLimit rules undo each other's work, so I've modified both of them to not fire when there're only dereferences. It should work the following way: Initial Plan:
Another option is to actually split the projectNode in PushLimitThroughProject into two project nodes, where the lower one only has dereferences. The rule pushes limit only through the first node. But I feel the current approach is simpler and has the same overall effect. PushLimitThroughProject avoids limit pushdown only when the dereferences are "exclusive", so that limit goes below overlapping dereferences. For example, we still push the limit in the following case. I think it is more optimal than not pushing the limit at all, since PushDereferenceThroughLimit will push the sufficient dereferences down.
|
...n/src/main/java/io/prestosql/sql/planner/iterative/rule/dereference/DereferencePushdown.java
Outdated
Show resolved
Hide resolved
a590279 to
e37684b
Compare
presto-main/src/test/java/io/prestosql/sql/planner/iterative/rule/test/PlanBuilder.java
Outdated
Show resolved
Hide resolved
Co-authored-by: qqibrow <qqibrow@gmail.com> Co-authored-by: Zhenxiao Luo <luoz@uber.com>
e37684b to
ff420b3
Compare
ff420b3 to
f95f644
Compare
|
Oracle failure is unrelated. |
supercedes #1435