Skip to content

Conversation

@aiborodin
Copy link
Contributor

@aiborodin aiborodin commented Oct 13, 2025

This PR moves the WriteResult aggregation logic from DynamicCommitter to DynamicWriteResultAggregator, as described in this comment: #14182 (comment).

DynamicWriteResultAggregator currently produces multiple DynamicCommittables per (table, branch, checkpoint) triplet. This initially broke the commit recovery of the dynamic Iceberg sink (see #14090), and was later addressed by a hot fix to aggregate WriteResults in the DynamicCommitter.

Refactor the DynamicWriteResultAggregator to output only one committable per triplet. Clean up DynamicCommitter to remove assumptions of multiple commit requests per table, branch, and checkpoint. This requires serializing the aggregated WriteResult using multiple temporary manifests for each unique partition spec because the Iceberg manifest writer requires a single partition spec per file. We can improve this later by changing how we serialize DataFiles and DeleteFiles for Flink checkpoints in the DynamicSink.

@aiborodin aiborodin force-pushed the refactor-commit-aggregation branch from 5904b13 to 3b00257 Compare October 13, 2025 11:52
@aiborodin aiborodin changed the title Iceberg: Refactor WriteResult aggregation in DynamicIcebergSink Flink: Refactor WriteResult aggregation in DynamicIcebergSink Oct 13, 2025
@aiborodin aiborodin force-pushed the refactor-commit-aggregation branch 2 times, most recently from 9887fcb to 14f0c7b Compare October 20, 2025 08:02
DynamicWriteResultAggregator currently produces multiple committables
per (table, branch, checkpoint), which get aggregated in the downstream
committer. Refactor the commit aggregator to output only one committable
per triplet. Clean up DynamicCommitter to remove assumptions of multiple
commit requests per table, branch, and checkpoint.

This requires serializing the aggregated WriteResult using multiple
temporary manifest files for each unique partition spec because the
Iceberg manifest writer requires a single spec per manifest file. We
can improve this later by refactoring serialization in the following
changes.
@aiborodin aiborodin force-pushed the refactor-commit-aggregation branch from 14f0c7b to 479668f Compare October 23, 2025 07:48
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 23, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@aiborodin
Copy link
Contributor Author

@pvary I re-opened this PR in #14810.
Could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants