Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Dec 5, 2023

This is a follow up to #9183. Null gets returned from committedFiles in 2 cases:

1.) The snapshot ID list is empty (e.g. ExpireSnapshots in a transaction). This snapshot ID is the set of snapshots produced as a result of the transaction

2.).The set of committed files could not exactly be determined (e.g. missing any snapshot metadata).

In the first case, the logic should perform file deletion in case there were no new snapshots (e.g. ExpireSnapshots) which #9183 pointed out and addressed.

In the second case, we do want to signal to callers that it's a unique situation and file deletions should not be performed.

What this change does is return an empty set in case 1 and null in case 2 to prevent deleting files when we could not determine the full set of committed files.

@github-actions github-actions bot added the core label Dec 5, 2023
@amogh-jahagirdar
Copy link
Contributor Author

#9183 @bartash already added tests for the ExpireSnapshots case, but based on the tests passing before that implies there weren't any tests for case 2 above (since those would have caught the behavior change in deletion in case snapshots couldn't be looked up). It may be difficult to test for that case (have to force the lookup of a committed snapshot to be null right after the transaction commits) , but I'll look into it.

@amogh-jahagirdar amogh-jahagirdar changed the title Core: Fix logic for determining set of committed files Core: Fix logic in BaseTransaction for determining set of committed files Dec 5, 2023
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Fix logic in BaseTransaction for determining set of committed files Core: Fix logic in BaseTransaction for determining set of committed files when there are no new snapshots Dec 5, 2023
Copy link
Member

@bartash bartash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement to my fix

@rdblue rdblue merged commit faa8b50 into apache:main Dec 5, 2023
bartash pushed a commit to bartash/iceberg that referenced this pull request Dec 6, 2023
amogh-jahagirdar added a commit that referenced this pull request Dec 19, 2023
…ed (#9223)

* Core: Expired Snapshot files in a transaction should be deleted. (#9183)

When a snapshot is expired as part of a transaction, the manifest list
file(s) should be deleted when the transaction commits. A recent change
(#6634) ensured that these files are not deleted when they have also
been committed as part of a transaction, but this breaks the simple
case where no new files are committed. Fix this by not skipping
deletion when the list of committed files is empty.

* Core: Fix logic for determining set of committed files in BaseTransaction when there are no new snapshots (#9221)

---------

Co-authored-by: Amogh Jahagirdar <[email protected]>
lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants