Iceberg: Support applying equality deletes as a join#21605
Iceberg: Support applying equality deletes as a join#21605tdcmeehan merged 3 commits intoprestodb:masterfrom
Conversation
b27553d to
2bc5098
Compare
2bc5098 to
c85266a
Compare
tdcmeehan
left a comment
There was a problem hiding this comment.
Just leaving some generic comments on the plan optimizer, please generalize the feedback where possible. I will need more time to review.
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
tdcmeehan
left a comment
There was a problem hiding this comment.
Iceberg: Support metadata columns
Can we update the documentation as well noting these new columns (iceberg.rst)?
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java
Outdated
Show resolved
Hide resolved
tdcmeehan
left a comment
There was a problem hiding this comment.
Allow connector optimizers to create joins
I think this approach is OK. I would move out Type and EquiJoinClause to top level classes because they're shared between the ConnectorJoinNode and the original JoinNode, and the inner class relationship these classes have to ConnectorJoinNode may imply that they're somehow specific to the connector plan optimizer somehow.
tdcmeehan
left a comment
There was a problem hiding this comment.
Iceberg: Support applying equality deletes as a join
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/TableType.java
Outdated
Show resolved
Hide resolved
tdcmeehan
left a comment
There was a problem hiding this comment.
Iceberg: Support applying equality deletes as a join
I think broadcast join is a global property of the query. Is it expected that users will manually specify that they enable broadcast join?
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Outdated
Show resolved
Hide resolved
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Outdated
Show resolved
Hide resolved
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Outdated
Show resolved
Hide resolved
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Show resolved
Hide resolved
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
From what I understand the default is I think I corrected all other comments. |
6d98edd to
1c8e8e8
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 0907cd8...2df0c25.
|
1c8e8e8 to
b3821ae
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Overall looks good to me. Some little problems to figure out.
There was a problem hiding this comment.
Why not use something like java.util.Collections.emptyIterator, it seems to lead in some dependency problems.
There was a problem hiding this comment.
bad import by mistake, thanks for catching this.
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
...iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
Outdated
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the documentation! Two small nits only.
|
@yingsu00 added |
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the quick reply and fixes! I found one more nit when I did a new local build of the docs for you to consider.
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pulled the branch again, new local build of the docs, everything looks good. Thanks!
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
yingsu00
left a comment
There was a problem hiding this comment.
@jasonf20 Thanks for uploading the query plan. Could you please add another actual plan for multiple equality delete files with different equality ids? You can use "explain analyze" to get an actual plan.
Also, I wonder if you could expand the tests to test its interplay with "iceberg.pushdown_filter_enabled" with ORC files (Java Parquet doesn't support filter pushdown). It'll be great if you can add an actual plan for that case too.
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/ConnectorJoinNode.java
Outdated
Show resolved
Hide resolved
|
@jasonf20 Will you be able to do a small comparative TPCDS run with this equality join turned on/off and with/without equality delete files to make sure there is no regression? Without any equality delete files in the data, the before/after plans should be the same and performance should also be same. With some equality delete files, the new plans have more joins and may affect the join reordering later on. If we can show there are no regressions I think the PR is good to go. Thank you! |
75778fb to
f1f3c4f
Compare
|
Hi @yingsu00 Thanks for the comments, Adding the explain plan to the end of this comment. Regarding
Regarding TPCDS testing, there is currently no API to create equality delete files in Presto so there isn't a way to run this test with TPCDS. The rewriter explictly returns the current node when there are no equality delete files. So this code shouldn't cause any change to the explain plan for queries without Equality Delete files. For tables with Equality Delete files (which can't be created using presto) the existing implementation is almost unusable for even small amounts of deletes (see initial perfomance boost in PR message). The possibility of a regression when Equality Delete files exist is mitigated by the fact that previously they were not really queryable. Given this information, how should we proceed? Explain plan with multiple delete schemas: |
|
@jasonf20 Thanks for your information. Regarding TPCDS testing, I thought you have other engines that can generate equality deletes, but just learnt that it's actually hard for you to generate them now. Prestissimo will support equality deletes but it's not there yet. So I'll just withdraw the ask of TPCDS tests. What I was concerned was that this may affect the overall query shape, since it's adding some new join nodes in the logical planning phase, and the join reordering rule will be performed later and may introduce some uncertainty. Sometimes the plan shape changes because of some seemingly unrelated issues and the result may not be ideal. That's why I was hoping you could run the TPCDS benchmark to rule out the possibility of some unexpected plan change. But if testing TPCDS is hard, we will test it ourselves in the future. |
|
Thanks @yingsu00. I did validate the query shape is unchanged when disabled for queries on tables that I had with equality deletes. But not TPCDS tables, that isn't something I have access to. In my tests, when there are no equality deletes and/or when the flag is disabled the explain plan I got was the unchanged. Of course this is limited to the queries I tested but it does have some coverage. I think static analysis of the start of the rewriter can also help validate this. |
f1f3c4f to
6ff947e
Compare
aaneja
left a comment
There was a problem hiding this comment.
Still reviewing, will come back to this
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Outdated
Show resolved
Hide resolved
6ff947e to
d691a77
Compare
There was a problem hiding this comment.
nit: There is a variant which has a Collection as its input, so Optional.of(LogicalRowExpressions.and(node.getFilters())) will suffice
There was a problem hiding this comment.
It ends up being a little more code since I need to check if the filters list is empty.
node.getFilters().isEmpty() ? Optional.empty() : Optional.of(LogicalRowExpressions.and(node.getFilters())),
Which do you prefer?
There was a problem hiding this comment.
I went with the second option so that it will create a flat Expression instead of a nested one.
d691a77 to
b5bc884
Compare
The current join nodes are in presto-main. Moving them requires moving a lot of objets. Since we don't need to allow the connectos to optimize the joins for now we can just add an adapter node that will be converted back to the internal node type when returned by connector optimzers.
Support for $path and $data_sequence_number hidden columns. The first is generally useful and the second is a requirement for implementing equality deletes as a join.
The current equality delete implementation applies deletes at the split level. Since equality deletes often apply to a lot of files, the current implementation ends up opening the delete files #splits * #delete_files times. This commit implements equality deletes as a join. A connector optimizer is added to apply the appropriate join(s).
b5bc884 to
2df0c25
Compare
feilong-liu
left a comment
There was a problem hiding this comment.
Only reviewed the first commit and skip the next two icerberg connector specific commits, lgtm
|
Thank you @jasonf20 for this very nice contribution, and thank you @feilong-liu @hantangwangd @steveburnett @aaneja @yingsu00 for the thorough reviews. |
|
@jasonf20 I saw you have another PR for Trino: cebergPlugin: Performance improvements for Equality Delete files which doesn't need to add joins. How about the comparisons of the two approches? cc @tdcmeehan |
Description
This PR adds support for applying equality deletes as a join instead of while reading the split data.
Since equality deletes often apply to a lot of files, the current
implementation ends up opening the delete files #splits * #delete_files
times which can be very large for cdc/upsert use cases.
This commit implements equality deletes as a join. A connector optimizer
is added to apply the appropriate join(s).
Motivation and Context
Currently, the use case of frequent updates with MoR + Eventual compaction doesn't work because querying tables with more than a couple of delete files is too slow. This enables new use cases to be done using Iceberg in Presto such as CDC.
Example
This example queries a table on a partition with ~5GB of data files + ~150 delete files. Cpu time of 39 seconds vs 5:30 hours

image
For each additional delete file the current implementation will need to load #splits more files. While the new method will only load 1 more file. So the difference will keep growing with more files.
Query Plan
Impact
New parameter to configure if this is enabled.
Test Plan
Contributor checklist
Release Notes