Allow executing optimize procedure for Iceberg v2 table#12351
Allow executing optimize procedure for Iceberg v2 table#12351ebyhr merged 1 commit intotrinodb:masterfrom
Conversation
What would it take to extend this to support v2 table with deletes? cc @alexjo2144 |
|
I tried committing delete-files beforehand like this, but it still failed "Cannot commit, found new delete for replaced data file" error. DeleteFiles deleteFiles = transaction.newDelete();
for (DeleteFile deletedFile : deletedFiles) {
deleteFiles.deleteFile(deletedFile.path());
}
deleteFiles.commit();
RewriteFiles rewriteFiles = transaction.newRewrite();
... |
|
@ebyhr how do we know which delete files we can delete? |
There was a problem hiding this comment.
Could you please write similar test or modify this in such a way that deletion is not the last operation ? So delete files are not part of the current snapshot?
I think you want the |
This could be tricky since delete-files may have deletes for multiple data-files. Some of which may be rewritten by We skip scanning files during optimize if they're bigger than a given size, but we could change that to scan them files if they are small or have any delete files. That way we ensure that when the procedure completes all delete files can be removed. |
yes
no, because there may be additional filters
yes, tricky. What would happen if we don't delete any deletion files? Will Iceberg library prune them? |
Right now
You won't be able to commit a snapshot in this state. The validations ensure that the data-files a delete-file references exist in the current snapshot. |
Okay, but that requires reading all manifest files. |
006e40e to
c17aebd
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
I was wrong about the validations that run here, you can commit a table in this state, it just has a dangling delete-file that is never used. |
|
Test failure is related |
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
bd86253 to
525e071
Compare
0d0ce0e to
ab6fd14
Compare
There was a problem hiding this comment.
Keep the condition, just remove this comment.
We don't know what Iceberg v3 brings us.
There was a problem hiding this comment.
| // org.apache.iceberg.Table.snapshot method returns null if there is no matching snapshot | |
| // Table.snapshot method returns null if there is no matching snapshot |
ab6fd14 to
98e631d
Compare
Description
Allow optimizing v2 table if delete files don't exist in Iceberg
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) Release notes entries required with the following suggested text: