-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Validate concurrently added delete files in OverwriteFiles #3199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -107,23 +107,28 @@ public interface OverwriteFiles extends SnapshotUpdate<OverwriteFiles> { | |
| OverwriteFiles caseSensitive(boolean caseSensitive); | ||
|
|
||
| /** | ||
| * Enables validation that files added concurrently do not conflict with this commit's operation. | ||
| * Enables validation that data files added concurrently do not conflict with this commit's operation. | ||
| * <p> | ||
| * This method should be called when the table is queried to determine which files to delete/append. | ||
| * This method should be called while committing non-idempotent overwrite operations. | ||
| * If a concurrent operation commits a new file after the data was read and that file might | ||
| * contain rows matching the specified conflict detection filter, the overwrite operation | ||
| * will detect this during retries and fail. | ||
| * will detect this and fail. | ||
| * <p> | ||
| * Calling this method with a correct conflict detection filter is required to maintain | ||
| * serializable isolation for eager update/delete operations. Otherwise, the isolation level | ||
| * serializable isolation for overwrite operations. Otherwise, the isolation level | ||
| * will be snapshot isolation. | ||
| * <p> | ||
| * Validation applies to files added to the table since the snapshot passed to {@link #validateFromSnapshot(long)}. | ||
| * | ||
| * @param conflictDetectionFilter an expression on rows in the table | ||
| * @return this for method chaining | ||
| * @deprecated since 0.13.0, will be removed in 0.14.0; use {@link #conflictDetectionFilter(Expression)} and | ||
| * {@link #validateNoConflictingData()} instead. | ||
| */ | ||
| OverwriteFiles validateNoConflictingAppends(Expression conflictDetectionFilter); | ||
| @Deprecated | ||
| default OverwriteFiles validateNoConflictingAppends(Expression conflictDetectionFilter) { | ||
| return conflictDetectionFilter(conflictDetectionFilter).validateNoConflictingData(); | ||
| } | ||
|
|
||
| /** | ||
| * Enables validation that files added concurrently do not conflict with this commit's operation. | ||
|
|
@@ -145,4 +150,52 @@ public interface OverwriteFiles extends SnapshotUpdate<OverwriteFiles> { | |
| */ | ||
| @Deprecated | ||
| OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression conflictDetectionFilter); | ||
|
|
||
| /** | ||
| * Sets a conflict detection filter used to validate concurrently added data and delete files. | ||
| * | ||
| * @param conflictDetectionFilter an expression on rows in the table | ||
| * @return this for method chaining | ||
| */ | ||
| OverwriteFiles conflictDetectionFilter(Expression conflictDetectionFilter); | ||
|
|
||
| /** | ||
| * Enables validation that data added concurrently does not conflict with this commit's operation. | ||
| * <p> | ||
| * This method should be called while committing non-idempotent overwrite operations. | ||
| * If a concurrent operation commits a new file after the data was read and that file might | ||
| * contain rows matching the specified conflict detection filter, the overwrite operation | ||
| * will detect this and fail. | ||
| * <p> | ||
| * Calling this method with a correct conflict detection filter is required to maintain | ||
| * isolation for non-idempotent overwrite operations. | ||
| * <p> | ||
| * Validation uses the conflict detection filter passed to {@link #conflictDetectionFilter(Expression)} and | ||
| * applies to operations that happened after the snapshot passed to {@link #validateFromSnapshot(long)}. | ||
| * If the conflict detection filter is not set, any new data added concurrently will fail this | ||
| * overwrite operation. | ||
| * | ||
| * @return this for method chaining | ||
| */ | ||
| OverwriteFiles validateNoConflictingData(); | ||
|
|
||
| /** | ||
| * Enables validation that deletes that happened concurrently do not conflict with this commit's operation. | ||
| * <p> | ||
| * Validating concurrent deletes is required during non-idempotent overwrite operations. | ||
| * If a concurrent operation deletes data in one of the files being overwritten, the overwrite | ||
| * operation must be aborted as it may undelete rows that were removed concurrently. | ||
| * <p> | ||
| * Calling this method with a correct conflict detection filter is required to maintain | ||
| * isolation for non-idempotent overwrite operations. | ||
| * <p> | ||
| * Validation uses the conflict detection filter passed to {@link #conflictDetectionFilter(Expression)} and | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Am I right to understand that we use rowFilter to do the detection, if conflictDetectionFilter is not set? Should we mention it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it is not set, the validation is tricky. I added a few sentences. Let me know if that makes sense, @szehon-ho.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea its a lot better. |
||
| * applies to operations that happened after the snapshot passed to {@link #validateFromSnapshot(long)}. | ||
| * If the conflict detection filter is not set, this operation will use the row filter provided | ||
| * in {@link #overwriteByRowFilter(Expression)} to check for new delete files and will ensure | ||
| * there are no conflicting deletes for data files removed via {@link #deleteFile(DataFile)}. | ||
| * | ||
| * @return this for method chaining | ||
| */ | ||
| OverwriteFiles validateNoConflictingDeletes(); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this as well? Looks like it was supposed to be removed already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean up all interfaces in the api module in a separate PR.