Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Mar 9, 2022

This PR fixes the deleted data files validation for overwrites by filter in OverwriteFiles. Right now, we allow removal of data files in partitions we overwrite, which, I think violates the assumption.

Found by @szehon-ho in #4293.

@github-actions github-actions bot added the core label Mar 9, 2022
@aokolnychyi aokolnychyi force-pushed the fix-overwrite-files branch from 1bda55e to ff4e42d Compare March 9, 2022 21:17
@aokolnychyi
Copy link
Contributor Author

@rdblue @szehon-ho @RussellSpitzer, could you take a look?

private Expression conflictDetectionFilter = null;
private boolean validateNewDataFiles = false;
private boolean validateNewDeleteFiles = false;
private boolean validateNewDeletes = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the rename here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause we not only validate added delete files but also deleted data files. We have two types of deletes to cover.

if (rowFilter() != Expressions.alwaysFalse()) {
Expression filter = conflictDetectionFilter != null ? conflictDetectionFilter : rowFilter();
validateNoNewDeleteFiles(base, startingSnapshotId, filter);
validateDeletedDataFiles(base, startingSnapshotId, filter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

@Test
public void testConcurrentConflictingDataFileDeleteOverwriteByFilter() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would previously fail as the commit would succeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error here is because file 2 is both modified and deleted correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we concurrently removed data from a partition what we are trying to overwrite.


Snapshot firstSnapshot = table.currentSnapshot();

OverwriteFiles overwrite = table.newOverwrite()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this one is alright because file 2 is modified, but file 1 is deleted.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@szehon-ho
Copy link
Member

+1 from me too, glad we found and fixed it!

@aokolnychyi aokolnychyi merged commit cceef9f into apache:master Mar 9, 2022
@aokolnychyi
Copy link
Contributor Author

Thanks, @RussellSpitzer @szehon-ho! Merged to unblock Szehon's PR.

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants