-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Expired Snapshot files in a transaction should be deleted. #9183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@amogh-jahagirdar can you please take a look when you have time as you implemented #6634 |
|
Thanks for the PR @bartash I'm taking a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, this does seem like a legitimate bug. Could we add a test to TestRemoveSnapshots ? I think there's no harm with the test in TestSequenceNumberForV2Table but TestRemoveSnapshot would be a better place for tests around this. There's a deletedFiles that's visible for testing on the transaction and I think could be used for assertions on files we expect to be deleted.
| .onFailure((file, exc) -> LOG.warn("Failed to delete uncommitted file: {}", file, exc)) | ||
| .run( | ||
| path -> { | ||
| if (committedFiles == null || !committedFiles.contains(path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch. If there are no committed files (which would be expected for a transaction with just ExpireSnapshots) but there are files to cleanup (which would be expected for ExpireSnapshots again) then we should proceed with the file removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized alternatively we could've just returned an empty set in committedFiles instead of null and then could've removed the (committedFiles == null) check (in addition to the current top level check). I'm not super opinionated on that though (it can be done in a follow on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should actually change committedFiles to return ImmutableSet.of() if there are no new snapshot IDs. The logic is correct to warn if the other reason null is returned happens (a committed snapshot is missing). null signals that the output of the method is invalid, which we assumed was the case if there are no committed snapshots. But here we have a case where it's a valid case to have no committed snapshots and therefore no committed files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR #9221 for addressing this.
core/src/test/java/org/apache/iceberg/TestSequenceNumberForV2Table.java
Outdated
Show resolved
Hide resolved
c0f8e00 to
b196590
Compare
|
@amogh-jahagirdar thanks for the helpful suggestions.
|
|
I'm reviewing this. Thanks for the contribution ! |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me, just some naming nits. Thanks for the fix. cc @rdblue In case he has any comments.
core/src/test/java/org/apache/iceberg/TestSequenceNumberForV2Table.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/TestSequenceNumberForV2Table.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/TestSequenceNumberForV2Table.java
Outdated
Show resolved
Hide resolved
When a snapshot is expired as part of a transaction, the manifest list file(s) should be deleted when the transaction commits. A recent change (apache#6634) ensured that these files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty. TESTING: Extended a unit test to ensure that manifest list files are deleted. Ran the test without the fix on a branch where apache#6634 was reverted to show that this is a regression.
b196590 to
2ebd21a
Compare
|
Thanks @amogh-jahagirdar I pushed changes for the nits. |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this gets in, can you create a PR against the 1.4.x branch?
…che#9183) When a snapshot is expired as part of a transaction, the manifest list file(s) should be deleted when the transaction commits. A recent change (apache#6634) ensured that these files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty. TESTING: Extended a unit test to ensure that manifest list files are deleted. Ran the test without the fix on a branch where apache#6634 was reverted to show that this is a regression.
Clean cherry-pick to 1.4.x is in #9223 |
…ed (#9223) * Core: Expired Snapshot files in a transaction should be deleted. (#9183) When a snapshot is expired as part of a transaction, the manifest list file(s) should be deleted when the transaction commits. A recent change (#6634) ensured that these files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty. * Core: Fix logic for determining set of committed files in BaseTransaction when there are no new snapshots (#9221) --------- Co-authored-by: Amogh Jahagirdar <[email protected]>
…che#9183) When a snapshot is expired as part of a transaction, the manifest list file(s) should be deleted when the transaction commits. A recent change (apache#6634) ensured that these files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty. TESTING: Extended a unit test to ensure that manifest list files are deleted. Ran the test without the fix on a branch where apache#6634 was reverted to show that this is a regression.
…che#9183) When a snapshot is expired as part of a transaction, the manifest list file(s) should be deleted when the transaction commits. A recent change (apache#6634) ensured that these files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty. TESTING: Extended a unit test to ensure that manifest list files are deleted. Ran the test without the fix on a branch where apache#6634 was reverted to show that this is a regression.
When a snapshot is expired as part of a transaction, the snapshot file(s) should be deleted when the transaction commits. A recent change (#6634) ensured that files are not deleted when they have also been committed as part of a transaction, but this breaks the simple case where no new files are committed. Fix this by not skipping deletion when the list of committed files is empty.
Closes #9182
TESTING:
Extended a unit test to ensure that snapshot files are deleted. Ran the test without the fix on a branch where #6634 was reverted to show that this is a regression.