-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Flink: Refactor WriteResult aggregation in DynamicIcebergSink #14312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.../flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicCommittable.java
Show resolved
Hide resolved
5904b13 to
3b00257
Compare
9887fcb to
14f0c7b
Compare
DynamicWriteResultAggregator currently produces multiple committables per (table, branch, checkpoint), which get aggregated in the downstream committer. Refactor the commit aggregator to output only one committable per triplet. Clean up DynamicCommitter to remove assumptions of multiple commit requests per table, branch, and checkpoint. This requires serializing the aggregated WriteResult using multiple temporary manifest files for each unique partition spec because the Iceberg manifest writer requires a single spec per manifest file. We can improve this later by refactoring serialization in the following changes.
14f0c7b to
479668f
Compare
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicCommitter.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicCommitter.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicCommitter.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicCommitter.java
Show resolved
Hide resolved
...k/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestDynamicIcebergSink.java
Show resolved
Hide resolved
...nk/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestDynamicCommittableSerializer.java
Show resolved
Hide resolved
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This PR moves the
WriteResultaggregation logic fromDynamicCommittertoDynamicWriteResultAggregator, as described in this comment: #14182 (comment).DynamicWriteResultAggregatorcurrently produces multipleDynamicCommittablesper (table, branch, checkpoint) triplet. This initially broke the commit recovery of the dynamic Iceberg sink (see #14090), and was later addressed by a hot fix to aggregateWriteResultsin theDynamicCommitter.Refactor the
DynamicWriteResultAggregatorto output only one committable per triplet. Clean upDynamicCommitterto remove assumptions of multiple commit requests per table, branch, and checkpoint. This requires serializing the aggregated WriteResult using multiple temporary manifests for each unique partition spec because the Iceberg manifest writer requires a single partition spec per file. We can improve this later by changing how we serializeDataFilesandDeleteFilesfor Flink checkpoints in theDynamicSink.