-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Deal with expired parent snapshot in MergingSnapshotProducer#validateDataFilesExist #2603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ca09b42 to
69524f7
Compare
…lidateDataFilesExist
69524f7 to
52be5bf
Compare
| // add all manifests of current snapshot and break | ||
| Long parentId = currentSnapshot.parentId(); | ||
| boolean shouldAddManifestsAndBreak = | ||
| startingSnapshotId == null && parentId != null && ops.current().snapshot(parentId) == null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the logic using shouldAddManifestsAndBreak is incorrect. The situation this is trying to account for is when the entire table history should be considered (startingSnapshotId is null), but only partial history is available (parentId is not null but the snapshot is missing). I think the idea here is that this can account for lost history by looking across all manifests for a delete.
I don't think that this is implemented correctly for a few reasons:
- If the oldest available snapshot is an append snapshot, then the outer check below for operation will prevent its manifests from being added as expected.
- The manifest group that scans for deletes filters entries to just the snapshot ids that were new since the starting id (
newSnapshots). When history is missing, that filter would need to be turned off or else the entries will be ignored even if you scan older manifests. - When reading the manifests to check for deletes, missing history could have removed those deletes in a manifest rewrite. If I delete file A, it will show up in a delete entry in a snapshot. But once I rewrite the manifest that encoded that delete, the delete is no longer kept. So expiring history can still cause a problem.
Because of that last flaw, I don't think that there is a way to do this safely by searching for deletes. The only way to do this validation without history is to scan the entire table for the required data files.
| } | ||
|
|
||
| @Test | ||
| public void testChangeLogOnIdKeyWithSnapshotExpire() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is updating a file in core, I would expect the tests to be more specific and located in core, not in the Flink integration.
|
@jshmchenxi, see my comments above for more detail, but this approach can't work for the situation you describe. If you want to support this situation, then you would need to scan all table metadata to check whether the files that need to exist still do using added and existing entries, not delete entries. Since delete entries can be lost when history is cleaned up, you have to check for the actual files. I'm going to close this PR since the approach won't work. Feel free to open another one with an alternative implementation that checks for added or existing entries for the required files. Thanks for working on this! |
For #2482
If parent snapshot is expired and starting snapshot is null, we should add current snapshot's all manifests and break. Thus we will not run into the exception "Cannot determine history ..."
Added some tests to cover Flink CDC write with expired snapshot.