Improve performance of Iceberg snapshot expiration#13399
Improve performance of Iceberg snapshot expiration#13399electrum merged 1 commit intotrinodb:masterfrom
Conversation
|
@electrum remove orphan files has the same pattern |
| long expireTimestampMillis = session.getStart().toEpochMilli() - retention.toMillis(); | ||
| expireSnapshots(table, expireTimestampMillis, session, executeHandle.getSchemaTableName()); | ||
|
|
||
| table.expireSnapshots() |
There was a problem hiding this comment.
So actually I did it the same way originally but then I got this comment from @rdblue
#10810 (comment)
There was a problem hiding this comment.
current implementation is based on iceberg's spark extention expire_snapshots
it is also based on the |
findepi
left a comment
There was a problem hiding this comment.
Per https://github.com/trinodb/trino/pull/13399/files#r933007638 this is not the way to do this.
cc @alexjo2144
(comment still applicable, but no need to request changes)
|
@rdblue is using the Iceberg library in this manner ok? The implementation seems to cover cherry-picked commits, which the original code didn’t handle. |
Description
Improve performance of the
expire_snapshotsoperation when there are many snapshots that reference the same manifest files. Previously, the manifest files were read once for each snapshot.We now use the Iceberg library code to compute the files to delete. The previous code may have correctness implications for cherry-picked commits or other scenarios.
Documentation
(x) No documentation is needed.
Release notes
(x) Release notes entries required with the following suggested text: