Use batch deletes in IcebergMetadata#14908
Conversation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
3017ab6 to
2c87d6b
Compare
There was a problem hiding this comment.
You may want to make the multi-file deletion in batches of DELETE_BATCH_SIZE size.
There was a problem hiding this comment.
I'll add it, but IMO it's not really necessary. In remove_orphan_files or expire_snapshots it's a good idea because we're loading the list of files to delete incrementally, and batching there prevents us from materializing the full list in memory. Here, we already have the list in memory, so we can rely on the file system to batch in sizes appropriate for the implementation.
There was a problem hiding this comment.
I agree that the way you did it here is not helpful. Actually I expected you will partition it on higher level -
fullyDeletedFiles.values().stream()
.flatMap(Collection::stream)
.map(CommitTaskData::getPath)
without collecting it to list first. But now, after I learnt that Lists.partition() is not lazy, I am not sure if this can be done...
There was a problem hiding this comment.
What's the benefit?
if the FS needs batches, why not let it do on its own?
There was a problem hiding this comment.
i agree with 2c87d6b#r1015558983 that the data list is already in memory (in some form)
There was a problem hiding this comment.
keep deleteBatch.add(location + "/" + fileName); and if (deleteBatch.size() >= DELETE_BATCH_ lines together
de82ed2 to
ee7b6ed
Compare
|
Build is green. Will squash and merge. |
ee7b6ed to
241c2cf
Compare
Description
@homar recently added an API to the FileSystem API to delete a batch of files, if the underlying filesystem supports it. Leverage that API in places where many files are deleted at once.
Non-technical explanation
Leverage delete APIs in a more efficient way, if available.
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: