Conversation
kokosing
left a comment
There was a problem hiding this comment.
Just skimmed. Left some general comments.
There was a problem hiding this comment.
why is it behind the feature flag?
There was a problem hiding this comment.
The name is odd. What is the purpose of this and why com.facebook.presto.sql.planner.SymbolsExtractor is not good enough for you case?
There was a problem hiding this comment.
good point. I replaced that using SymbolsExtractor.
There was a problem hiding this comment.
It is a bad idea. Identifier belongs to AST and symbol to IR (plan node tree) model.
There was a problem hiding this comment.
the class is removed.
There was a problem hiding this comment.
InlineDereferenceExpressions Rule is removed. class PushdownDereferenceExpression and change in InlineProjections are enough.
There was a problem hiding this comment.
I am sorry but I do not see any test for that Rule.
There was a problem hiding this comment.
is there any point of having this added to all optimizers if you cannot take advantage of this in table scan operator today?
Maybe we could just add dead code until the moment you have all the pieces needed to prune unreferenced columns.
There was a problem hiding this comment.
I see. After this part, I will start working on part II and I will not merge this until that is finished. what do you think?
There was a problem hiding this comment.
what about other expression types which may contain Symbol in it.
There was a problem hiding this comment.
Please format this as:
Set<Symbol> dereferenceSymbols = parent.getAssignments().entrySet().stream()
.filter(entry -> isBaseSymbolInChild.apply(entry.getValue()))
.map(Map.Entry::getKey)
.collect(Collectors.toSet());
Use the same formatting for Assignments.builder()....build();
There was a problem hiding this comment.
To me it is a bit odd to create Function to use it as Predicate.
There was a problem hiding this comment.
Shall I add tests now or after refactoring using Rules, if refactoring is possible?
|
@kokosing thank you for reviewing! I refactored the code a bit and working on fixing several test errors. could you take a look at my replies first? thanks! |
* prestodb#10064 - Pushdown dereference expression * prestodb#10064 - Pushdown dereference expression
|
Sorry for the delay. I was waiting for the new parquet reader fixing and refactoring #9156. Will continue this patch from now. |
666c182 to
2f89f7e
Compare
I got this error using Hive connector. However, I can't reproduce the error using WITH...VALUES... statement. |
|
@oneonestar In previous approach, All nested columns are treated as single column handle to the parquet reader. Although this way simplified the design and code structure, it has two major drawbacks:
The refined explain looks like: before: after: Must of the code are done, now I am intensively testing the new version. Hopefully I can send diff by end of next week (might miss some unit test but functionality ready) |
Add PushDownDereferenceExpression to pushdown dereferences right above TableScan Add MergeNestedColumn to convert valid dereference into ColumnHandle in TableScan Add NestedColumn into HiveColumnHandle Change in ParquetReader to use NestedColumn in file reading
b4a4037 to
970e295
Compare
|
all functions are done. |
|
@martint could you take a look at the overall code structure? thanks! |
|
Hi @qqibrow -- if you resubmit the PR to https://github.com/prestosql/presto we'd be happy to take a look and move it forward. |
|
@qqibrow How ready is this PR? Can it be rebased to resolve conflicts? Is it possible to extent this to support subscripts for arrays and maps, e.g. |
|
I assume #13180 superceds this one, hence, closing. |
Following design:
Dereference expressions will be pushed down to the projection right above tableScan, which saves CPU/Memory/Network cost.
for query:
current plan:
enable dereference pushdown:
MergeNestedColumns
MergeNestedColumns will detect "project above tableScan" pattern and try to push nestedColumn metadata into TableScan. An new Metadata API
getNestedColumnHandlesis created to support custom nested column pushdown logic.getNestedColumnHandles in Hive
getNestedColumnHandlesin Hive return every dereference as independent HiveColumnHandle. AddedOptional<NestedColumn>in HiveColumnHandle.After all query plan will looks like:
before:
after: