-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Implement Snapshot validation API for commits #14514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: Ie7e2bc0920bd855188c5bcc2435ace947db9af21
|
@rdblue What do you think? |
|
I just thought of another issue with this approach. This use of |
Wouldn't both the history validation and other validations want to check the same set of parent snapshots? |
|
I think that the starting snapshot ID is often based on what an operation read. This validation is likely going to be based on the current version when the transaction started. Since those are different, I would not reuse starting snapshot ID. |
But would it be reasonable for the operation that has read a different version (not the current one) to run the custom validation from that read version? Why would it want to run the custom validation from the current instead of the read version if it based all its update assumptions on that read version? |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This change implements the ability to add custom
Snapshotvalidations to the existingSnapshotUpdateAPI.It focuses on reusing existing validation APIs in
SnapshotProducerand removing code duplication, while providing enough flexibility for clients, such as Kafka Connect and Flink.It is an alternative solution to #14509.
Why?
Custom
Snapshotvalidation is necessary for non-idempotent table update operations, which rely on the existing state for correctness and exactly-once delivery. Applications like Flink and Kafka Connect use snapshot properties to store their idempotence keys, which identify the base state during recovery. Due to the nature of concurrent commits in Iceberg, these applications need the ability to check information of the new base snapshots to identify idempotence violations. This change addresses this problem and allows clients to implement custom idempotence validations.How?
This change achieves the following:
validateWith(Consumer<Snapshot> snapshotValidator)method to theSnapshotUpdateto allow customSnapshotvalidations.validateFromSnapshot(long snapshotId)from child interfaces (OverwriteFiles,ReplacePartitions, andRowDelta, etc.) to the parentSnapshotUpdateand remove duplicate definitions.Snapshotfor validation, which is used by the Kafka Connect PR (Validate concurrent commits in DynamicIcebergSink to prevent commit duplication #14517) and other validations.SnapshotProducer::validate(TableMetadata currentMetadata, Snapshot snapshot), allowing child classes to inherit the validation functionality.This approach results in maximised code reuse and aligns with the existing validation functionality.
Impact?
This change is used as a parent in the following PRs: