Skip to content

Improve performance of Iceberg snapshot expiration#13399

Merged
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-snapshot
Jul 29, 2022
Merged

Improve performance of Iceberg snapshot expiration#13399
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-snapshot

Conversation

@electrum
Copy link
Member

@electrum electrum commented Jul 28, 2022

Description

Improve performance of the expire_snapshots operation when there are many snapshots that reference the same manifest files. Previously, the manifest files were read once for each snapshot.

We now use the Iceberg library code to compute the files to delete. The previous code may have correctness implications for cherry-picked commits or other scenarios.

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg connector
* Improve performance of the `expire_snapshots` command for tables with many snapshots. ({issue}`13399`)

@cla-bot cla-bot bot added the cla-signed label Jul 28, 2022
Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@erichwang
Copy link
Contributor

@electrum remove orphan files has the same pattern

long expireTimestampMillis = session.getStart().toEpochMilli() - retention.toMillis();
expireSnapshots(table, expireTimestampMillis, session, executeHandle.getSchemaTableName());

table.expireSnapshots()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually I did it the same way originally but then I got this comment from @rdblue
#10810 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current implementation is based on iceberg's spark extention expire_snapshots

@homar
Copy link
Member

homar commented Jul 29, 2022

@electrum remove orphan files has the same pattern

it is also based on the remove_orphan_files from iceberg's spark extension

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi findepi dismissed their stale review July 29, 2022 09:19

(comment still applicable, but no need to request changes)

@electrum
Copy link
Member Author

@rdblue is using the Iceberg library in this manner ok? The implementation seems to cover cherry-picked commits, which the original code didn’t handle.

@electrum electrum merged commit bedbcf6 into trinodb:master Jul 29, 2022
@electrum electrum deleted the iceberg-snapshot branch July 29, 2022 20:02
@github-actions github-actions bot added this to the 392 milestone Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants