-
Notifications
You must be signed in to change notification settings - Fork 3k
Handle the case that RewriteFiles and RowDelta commit the transaction… #3204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… at the same time
|
@openinx Can you help take a look? |
|
@yyanyy @stevenzwu @rdblue Can you help take a look? |
|
is this a duplicate of #3069 ? |
@jackye1995 it's not, this PR is focus on handle duplicate record when there is some delete file be commit during rewrite action. And i update description of this PR that describe how this PR handle this case. |
| } | ||
|
|
||
| @Test | ||
| public void testNewDeleteFile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing tests? This is probably incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is related to removing the validation in BaseRewriteFiles. Like I noted there, we want to keep that validation and use it in certain cases.
| if (replacedDataFiles.size() > 0) { | ||
| // if there are replaced data files, there cannot be any new row-level deletes for those data files | ||
| validateNoNewDeletesForDataFiles(base, startingSnapshotId, replacedDataFiles); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be removed. There is a valid use case for the operation to check whether there are conflicts instead of re-sequencing data files. If you want, you can add a configuration method to enable/disable the validation.
| * @param sequenceNumber a sequenceNumber | ||
| * @return this for method chaining | ||
| */ | ||
| RewriteFiles setSequenceNumber(long sequenceNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need better documentation for what this method does and when you would call it, since this affects correctness.
| try (ManifestListWriter writer = ManifestLists.write( | ||
| ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, sequenceNumber)) { | ||
| ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, | ||
| operation().equals(DataOperations.REPLACE) && sequenceNumber() != null ? sequenceNumber() : sequenceNumber)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a clever way to fix the problem. What we had considered before is setting the sequence number of individual files rather than the sequence number used for the manifest list. This is an easier way to set the sequence number for all new data files in the snapshot.
I need to think about whether this is the right approach a bit more. As long as we have a static sequence number, using inheritance is only a convenience. We could set that static sequence number on the individual data or delete files that we add in the commit. I tend to lean toward that solution because it minimizes the places that use the overridden sequence number -- so we know that the manifest list and manifests all have the latest sequence number that is assigned to the snapshot rather than also being re-sequenced.
I'll think about the trade-off some more.
| * | ||
| * @return a string operation | ||
| */ | ||
| protected Long sequenceNumber() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a more specific name, like sequenceNumberOverride
|
|
||
| @Override | ||
| protected Long sequenceNumber() { | ||
| return replaceSequenceNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this approach, I think we need a validation that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.
|
@rdblue I commit one new commit for your suggestion, can you help review one more time? The fix detail as follows:
The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number. |
|
Running CI. |
|
@rdblue Hello, is there any progress ? |
|
I've been thinking about this case and I think that the right way to do this is to set the sequence number on individual files rather than at the snapshot level. I don't think that we should change the sequence number of the snapshot or manifest list. We should just set the sequence number of individual data files. Basically, I agree with @yyanyy's comment:
I think we need to set the file sequence numbers. That raises the question: what do we set them to? Ideally, we would use the latest, but the delete file commit has claimed that number so we need to go with the sequence number that is less than any other commits. That would be the sequence number that was current when rewrite operation started. It would be nice to find a way around reusing a sequence number from a different snapshot, but I don't see a good way to do that right now. We can possibly fix that up later by skipping sequence numbers. |
|
@rdblue I think import file sequence numbers will cause unnecessary complexity, and the sequence number of manifest list can mean file commit sequence, the data file sequence number just describe different sequence of snapshot and data file, but manifest list sequence numbers can do this too, because there is snapshotid in manifest list file ,so wo don't need hold same sequence numbers for one snapshot in the snapshot and manifest list. So even if we import file sequence numbers it just cause the same effect as the sequence number of manifest list. |
|
@hameizi, can you help review and test #3480? That's an alternative approach to what you're doing here that sets the sequence number per data file. I think that change is actually really important. While I was reviewing this, I thought that it was probably not a good idea to set the sequence number by reusing inheritance. Now that we've thought through the use case more, we've come up with a good reason not to: it makes it so we can't recover the sequence number where files were added, not just the sequence number where the data lives in time. |
This PR is handle the case that RewriteFiles and RowDelta commit the transaction at the same time what will cause duplicate record in table.
The approach is just like what discuss in #2308 (comment). We use the old seqnumber in manifest-list file same as the seqnumber of current snapshop when we start rewrite action. So when the rewrite action finished the replace snapshot will be set with one new seqnumber in metadata.json but one old seqnumber in manifest-list file. Then when we read the replace snapshot that these files what is commit during the rewrite action have bigger seqnumber than this replace snapshot, so if there is eq-delete file is commit they will apply into the replace snapshot.