-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Validate concurrently added delete files in OverwriteFiles #3199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Validate concurrently added delete files in OverwriteFiles #3199
Conversation
4eb4063 to
778c1e8
Compare
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I couldn't think of other conflicts to test but maybe someone else can :)
| * use {@link #validateNoConflictingAppends(Expression)} and {@link #validateFromSnapshot(long)} instead | ||
| */ | ||
| @Deprecated | ||
| OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression conflictDetectionFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this as well? Looks like it was supposed to be removed already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean up all interfaces in the api module in a separate PR.
| if (validateNewDeleteFiles && base.currentSnapshot() != null) { | ||
| if (rowFilter() != Expressions.alwaysFalse()) { | ||
| validateNoNewDeleteFiles(base, startingSnapshotId, conflictDetectionFilter(), caseSensitive); | ||
| } else if (deletedDataFiles.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use deletedDataFiles.isEmpty for consistency with below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of negation inside conditions but it is purely a personal thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK fair enough :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we all have some of these. Personally, I don't use ++ at all. When reviewing I ignore it unless the return value is used, which I think makes code harder to understand.
| * Calling this method with a correct conflict detection filter is required to maintain | ||
| * serializable isolation for overwrite operations. | ||
| * <p> | ||
| * Validation uses the conflict detection filter passed to {@link #conflictDetectionFilter(Expression)} and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right to understand that we use rowFilter to do the detection, if conflictDetectionFilter is not set? Should we mention it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not set, the validation is tricky. I added a few sentences. Let me know if that makes sense, @szehon-ho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea its a lot better.
|
|
||
| if (validateNewDeleteFiles && base.currentSnapshot() != null) { | ||
| if (rowFilter() != Expressions.alwaysFalse()) { | ||
| validateNoNewDeleteFiles(base, startingSnapshotId, conflictDetectionFilter(), caseSensitive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Takes a bit of time to understand this if - else, especially as conflictDetectionFilter() itself returns conditionally. Wonder if it can be simplified any way.
Tracing through the many cases,I was wondering some cases, eg:
If (rowFilter != false && !deletedDataFiles.isEmpty && conflictDetectionFilter == true), seems we will call validateNoNewDeleteFiles with conflictDetectionFilter, is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the logic to behave a bit differently now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this looks a lot better and easier to understand now
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to resolve the conversations, but looks good for me now, thanks @aokolnychyi
| overwriteFiles.validateNoConflictingAppends(conflictDetectionFilter); | ||
| overwriteFiles.conflictDetectionFilter(conflictDetectionFilter); | ||
| overwriteFiles.validateNoConflictingData(); | ||
| overwriteFiles.validateNoConflictingDeletes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aokolnychyi, shouldn't this check whether the operation is a delete? If this is invoked by DELETE FROM then we don't need to validate conflicting deletes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can only be done for merge-on-read. If I delete file_A with copy-on-write and overwrite it with file_B, I should still check no deletes happened for file_A, otherwise I'll undelete records.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple remaining questions, but overall I think this is correct and good to go!
|
Thanks for reviewing, @RussellSpitzer @rdblue @szehon-ho! |
This PR validates concurrently added delete files in
BaseOvewriteFiles.