Push down dereference expression#14637
Conversation
vkorukanti
left a comment
There was a problem hiding this comment.
Thanks @zhenxiao. I left some comments (some of them are questions I got while trying to understand the patch).
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why not just keep one dereference in this case msg.foo?
There was a problem hiding this comment.
Sorry I didn't understand this part. Why do we need an extra project on top? The project below the TargetNode should be enough right?
Is it for the following case?
FilterNode(predicate(a.x = 5), output=a)
-->
Project(row(a_x) as a)
FilterNode(predicate(a_x=5)
ProjectNode(deref(a.x) as a_x, output=a_x)
If this is the can we push the dereference through Filter and avoid the extra project on top?
There was a problem hiding this comment.
a projectNode on top is to keep all upper symbols unchanged
There was a problem hiding this comment.
For this case:
Filter(predicate(a.x) = 5, output = a.y)
Project(a)
-->
Filter(predicate(a_x) = 5, output = a_y)
Project(a, deref(a.x) as a_x, deref(a.y) as a_y)
Is the new project correct, given that we are adding the existing project symbols and the deref expressions as new projects?
presto-main/src/test/java/com/facebook/presto/sql/planner/TestDereferencePushDown.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Shouldn't we have a Project(msg.x as a_x, msg.y as a_y) in between the filter and values nodes?
mbasmanova
left a comment
There was a problem hiding this comment.
@zhenxiao I took a first pass and made some comments. I don't understand this change completely yet. Thanks for adding tests for the new optimizer rule. I think end-to-end correctness tests are needed as well. Also, would you update commit message to document the motivation for this change?
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I'm not sure I understand this TODO and the logic in the following if statement. Looks like [msg.foo, msg.foo.bar] can is not being processed at all. Is this intentional? Shouldn't this case result in pushing down msg.foo?
There was a problem hiding this comment.
bad. this is wrong comment
not a TODO, comment should be:
DereferenceExpression with the same base will cause unnecessary rewritten
I will fix
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
...-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PushDownDereferences.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I ran this test with coverage and I see some code paths in PushDownDereferences rule not covered. Would you add tests to ensure full coverage of the new rule?
presto-main/src/test/java/com/facebook/presto/sql/planner/TestDereferencePushDown.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/sql/planner/assertions/RowExpressionVerifier.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/com/facebook/presto/sql/planner/assertions/RowExpressionVerifier.java
Outdated
Show resolved
Hide resolved
Reading all subfields in one column could not leverage lazy block efficiently. e.g. SELECT msg.a, msg.b, msg.c FROM table WHERE msg.a = 10 Currently, we are reading the entire msg.[a,b,c] as one Block. This doesn’t utilize the lazy block loading during filter evaluation for msg.a = 10. Ideally, we should read msg.a, msg.b and msg.c as separate blocks, so that only load msg.a first and then decide whether to load msg.b and msg.c. This is the first step to pushdown dereference expressions. Following steps are: add connector specific optimizer to generate separate columns for subfields optimize file scan for columnar formats, e.g. Parquet Co-authored-by: qqibrow qqibrow@gmail.com
zhenxiao
left a comment
There was a problem hiding this comment.
thank you @mbasmanova @vkorukanti
comments are addressed
could you please take another look?
There was a problem hiding this comment.
bad. this is wrong comment
not a TODO, comment should be:
DereferenceExpression with the same base will cause unnecessary rewritten
I will fix
|
to be continued in: #14829 |

Co-authored-by: qqibrow qqibrow@gmail.com
To fix #14517, we need to pushdown dereference expressions as a first step.
continue #5547 #13180