-
Couldn't load subscription status.
- Fork 3.4k
Implement metadata_delete_after_commit_enabled and metadata_previous_versions_max for Iceberg table.
#20863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
metadata_delete_after_commit_enabled and metadata_previous_versions_max for Iceberg table.
51b784b to
8990542
Compare
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
8990542 to
971a596
Compare
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Can you also review potentially @amogh-jahagirdar |
|
We need this feature badly. Can't really use trino for iceberg for anything serious if we don't have a way to remove old metadata files. In our test case, the data file is around 100KB but the metadata files in total are over 200MB. We tried expire_snapshots & remove_orphan_files (retention_threshold set to 1d). No luck, the old metadata files refuse to go. |
|
We also need this feature. Tx for the implementation.:) |
|
Can you rebase @oneonestar And can @alexjo2144 @cwsteinbach @ebyhr and others help with review? |
971a596 to
1c48a7b
Compare
1c48a7b to
3413c22
Compare
Description
Implement
metadata_delete_after_commit_enabledandmetadata_previous_versions_maxfor Iceberg table.Without this feature, old metadata files (XXX.metadata.json files) will remains on the file system indefinitely after commit.
This corresponds to
write.metadata.delete-after-commit.enabledandwrite.metadata.previous-versions-maxproperties (see: https://iceberg.apache.org/docs/latest/maintenance/#remove-old-metadata-files)The following has been implemented:
This PR depends on #20410 which implemented
$metadata_log_entriessystem table for Iceberg table. This system table provides visibility on metadata file. Some testing codes are using$metadata_log_entries.First 2 commits are from #20410. Only the last commit is for this PR.
Fix #19582
Fix #14128
Supersede #20011
Implementation of
metadata_previous_versions_maxmetadata_previous_versions_maxis implemented by Iceberg's TableMetadata which Trino is using for commit. Trino only needs to set Iceberg table'swrite.metadata.previous-versions-maxproperty for this to work.Below is the call stack of an INSERT into Iceberg table.
Implementation of
metadata_delete_after_commit_enabledTrino is doing TableOperations by itself. The required logic has been added to
*TableOperations.Reference implementation in Iceberg:
https://github.com/apache/iceberg/blob/6a3b2d7c153412b01c746debb018c544516f2bbd/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L412-L440
Release notes
(x) Release notes are required, with the following suggested text: