Skip to content

Conversation

@tbaeg
Copy link
Member

@tbaeg tbaeg commented Jul 17, 2025

Description

Improve file delete performance for expire_snapshots.

Crude benchmark in a test. 10 per deletes per request (manually set the delete batch size).

OLD:

2025-07-21T13:00:29.337-0600	INFO	ForkJoinPool-1-worker-1	stdout	5024 metadata files found
2025-07-21T13:00:29.338-0600	INFO	ForkJoinPool-1-worker-1	stdout	1002 data files found
2025-07-21T13:00:32.795-0600	INFO	ForkJoinPool-1-worker-1	stdout	Time for expire_snapshots time: 3457
2025-07-21T13:00:32.845-0600	INFO	ForkJoinPool-1-worker-1	stdout	2011 metadata files found after expire_snapshots
2025-07-21T13:00:32.845-0600	INFO	ForkJoinPool-1-worker-1	stdout	2 data files found after expire_snapshots

NEW:

2025-07-22T12:50:13.288-0600	INFO	ForkJoinPool-1-worker-1	stdout	5024 metadata files found
2025-07-22T12:50:13.289-0600	INFO	ForkJoinPool-1-worker-1	stdout	1002 data files found
2025-07-22T12:50:15.741-0600	INFO	ForkJoinPool-1-worker-1	stdout	Time for expire_snapshots time: 2452
2025-07-22T12:50:15.797-0600	INFO	ForkJoinPool-1-worker-1	stdout	2011 metadata files found after expire_snapshots
2025-07-22T12:50:15.797-0600	INFO	ForkJoinPool-1-worker-1	stdout	2 data files found after expire_snapshots

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Improve performance of expire_snapshots procedure. ({issue}`26230`)

@cla-bot cla-bot bot added the cla-signed label Jul 17, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Jul 17, 2025
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from 9e77d23 to 9ffdd87 Compare July 17, 2025 16:42
@tbaeg tbaeg requested a review from raunaqmorarka July 18, 2025 20:17
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 2 times, most recently from 896fd0d to e0cd470 Compare July 19, 2025 14:16
@raunaqmorarka raunaqmorarka requested review from Copilot and sopel39 July 19, 2025 15:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the performance of the expire_snapshots procedure in Iceberg by introducing parallel file deletion capabilities and delegating file cleanup operations to Iceberg's native bulk delete functionality.

Key changes include:

  • Introduces a new configurable thread pool for file deletion operations
  • Replaces custom file deletion logic with Iceberg's native bulk delete operations
  • Updates thread pool sizing defaults with reasonable upper bounds

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ForIcebergFileDelete.java Adds new binding annotation for file delete executor dependency injection
IcebergConfig.java Adds configuration for file delete thread pool size
IcebergExecutorModule.java Creates new executor service for file deletion operations
IcebergMetadata.java Replaces custom file deletion with Iceberg's native bulk delete operations
IcebergMetadataFactory.java Integrates file delete executor into metadata factory
TestIcebergConfig.java Updates tests for new configuration and thread pool defaults
BaseIcebergMinioConnectorSmokeTest.java Updates test expectations for bulk delete behavior
Various test files Adds file delete executor parameter to test constructors

@raunaqmorarka
Copy link
Member

@tbaeg are you able to share any benchmark results from trying this change ?

@tbaeg
Copy link
Member Author

tbaeg commented Jul 20, 2025

@tbaeg are you able to share any benchmark results from trying this change ?

At the moment, I do not have any benchmarks to share. For our testing environment I'd first need to back port the change. I can try to get something this week.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for missing documentation.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from e0cd470 to c0402c7 Compare July 21, 2025 13:01
@github-actions github-actions bot added the docs label Jul 21, 2025
@findinpath
Copy link
Contributor

findinpath commented Jul 21, 2025

@tbaeg if it is not much trouble, could you add in the description of the PR an informal benchmark with and without parallelism for expire_snapshot.
I'd be interested to see the order of acceleration in terms of time we can achieve when using parallelism.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from c0402c7 to 4dc4d6c Compare July 21, 2025 17:43
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from 4dc4d6c to 5bc97de Compare July 21, 2025 17:53
@tbaeg
Copy link
Member Author

tbaeg commented Jul 21, 2025

@tbaeg if it is not much trouble, could you add an informal benchmark with and without parallelism for expire_snapshot. I'd be interested to see the order of acceleration in terms of time we can achieve when using parallelism.

I was trying to get something informal together on a test cluster with more substantial data, but it looks like it might not be possible. I can write a simple test and profile it locally, but not sure how worthwhile that is.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 4 times, most recently from dc1c0d2 to 8845baa Compare July 22, 2025 16:44
@tbaeg
Copy link
Member Author

tbaeg commented Jul 22, 2025

Did a POC that parallelizes ForwardingFileIo.deleteFiles and introduced a new FileIoFactory so I didn't need to inject the ExecutorService every time a new ForwardingFileIo object was created. If you could take a look again, I'd appreciate it.

@raunaqmorarka @findinpath @chenjian2664 @ebyhr

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 3 times, most recently from d39a0a4 to 1be0c5a Compare July 23, 2025 15:02
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall

Copy link
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % comments

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 3 times, most recently from 63d7f0d to 4bf7faf Compare July 26, 2025 01:42
- Add ForwardingFileIoFactory so delete specific ExecutorService is not injected in multiple places
- Remove direct usage of ForwardingFileIo in favor of a FileIoFactory implementation
- Add parallelized deletes in ForwardingFileIo
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from 4bf7faf to 28a31e5 Compare July 26, 2025 12:17
@raunaqmorarka raunaqmorarka merged commit 9855d53 into trinodb:master Jul 28, 2025
46 checks passed
@github-actions github-actions bot added this to the 477 milestone Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants