Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR validates concurrently added delete files in BaseOvewriteFiles.

@aokolnychyi aokolnychyi force-pushed the validate-overwrite-files branch from 4eb4063 to 778c1e8 Compare September 28, 2021 20:28
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, I couldn't think of other conflicts to test but maybe someone else can :)

* use {@link #validateNoConflictingAppends(Expression)} and {@link #validateFromSnapshot(long)} instead
*/
@Deprecated
OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression conflictDetectionFilter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this as well? Looks like it was supposed to be removed already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean up all interfaces in the api module in a separate PR.

if (validateNewDeleteFiles && base.currentSnapshot() != null) {
if (rowFilter() != Expressions.alwaysFalse()) {
validateNoNewDeleteFiles(base, startingSnapshotId, conflictDetectionFilter(), caseSensitive);
} else if (deletedDataFiles.size() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use deletedDataFiles.isEmpty for consistency with below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of negation inside conditions but it is purely a personal thing :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK fair enough :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we all have some of these. Personally, I don't use ++ at all. When reviewing I ignore it unless the return value is used, which I think makes code harder to understand.

* Calling this method with a correct conflict detection filter is required to maintain
* serializable isolation for overwrite operations.
* <p>
* Validation uses the conflict detection filter passed to {@link #conflictDetectionFilter(Expression)} and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right to understand that we use rowFilter to do the detection, if conflictDetectionFilter is not set? Should we mention it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not set, the validation is tricky. I added a few sentences. Let me know if that makes sense, @szehon-ho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea its a lot better.


if (validateNewDeleteFiles && base.currentSnapshot() != null) {
if (rowFilter() != Expressions.alwaysFalse()) {
validateNoNewDeleteFiles(base, startingSnapshotId, conflictDetectionFilter(), caseSensitive);
Copy link
Member

@szehon-ho szehon-ho Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes a bit of time to understand this if - else, especially as conflictDetectionFilter() itself returns conditionally. Wonder if it can be simplified any way.

Tracing through the many cases,I was wondering some cases, eg:

If (rowFilter != false && !deletedDataFiles.isEmpty && conflictDetectionFilter == true), seems we will call validateNoNewDeleteFiles with conflictDetectionFilter, is it intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic to behave a bit differently now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this looks a lot better and easier to understand now

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to resolve the conversations, but looks good for me now, thanks @aokolnychyi

overwriteFiles.validateNoConflictingAppends(conflictDetectionFilter);
overwriteFiles.conflictDetectionFilter(conflictDetectionFilter);
overwriteFiles.validateNoConflictingData();
overwriteFiles.validateNoConflictingDeletes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aokolnychyi, shouldn't this check whether the operation is a delete? If this is invoked by DELETE FROM then we don't need to validate conflicting deletes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can only be done for merge-on-read. If I delete file_A with copy-on-write and overwrite it with file_B, I should still check no deletes happened for file_A, otherwise I'll undelete records.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple remaining questions, but overall I think this is correct and good to go!

@aokolnychyi aokolnychyi merged commit 2e82453 into apache:master Oct 1, 2021
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @RussellSpitzer @rdblue @szehon-ho!

@rdblue rdblue changed the title Core: Validate concurrently added delete files in OvewriteFiles Core: Validate concurrently added delete files in OverwriteFiles Oct 1, 2021
@rdblue rdblue added this to the Java 0.12.1 Release milestone Oct 26, 2021
kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021
kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021
kbendick pushed a commit to kbendick/iceberg that referenced this pull request Nov 1, 2021
izchen pushed a commit to izchen/iceberg that referenced this pull request Dec 7, 2021
Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 13, 2021
Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants