Handle the case that RewriteFiles and RowDelta commit the transaction… #3204

hameizi · 2021-09-29T08:08:22Z

This PR is handle the case that RewriteFiles and RowDelta commit the transaction at the same time what will cause duplicate record in table.
The approach is just like what discuss in #2308 (comment). We use the old seqnumber in manifest-list file same as the seqnumber of current snapshop when we start rewrite action. So when the rewrite action finished the replace snapshot will be set with one new seqnumber in metadata.json but one old seqnumber in manifest-list file. Then when we read the replace snapshot that these files what is commit during the rewrite action have bigger seqnumber than this replace snapshot, so if there is eq-delete file is commit they will apply into the replace snapshot.

… at the same time

hameizi · 2021-09-29T08:08:51Z

@openinx Can you help take a look?

hameizi · 2021-09-29T08:19:22Z

@yyanyy @stevenzwu @rdblue Can you help take a look?

jackye1995 · 2021-09-30T00:50:49Z

is this a duplicate of #3069 ?

hameizi · 2021-09-30T02:24:27Z

is this a duplicate of #3069 ?

@jackye1995 it's not, this PR is focus on handle duplicate record when there is some delete file be commit during rewrite action. And i update description of this PR that describe how this PR handle this case.

rdblue · 2021-09-30T20:18:40Z

core/src/test/java/org/apache/iceberg/TestRewriteFiles.java

  }

-  @Test
-  public void testNewDeleteFile() {


Why are you removing tests? This is probably incorrect.

I see that this is related to removing the validation in BaseRewriteFiles. Like I noted there, we want to keep that validation and use it in certain cases.

rdblue · 2021-09-30T20:30:03Z

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java

-    if (replacedDataFiles.size() > 0) {
-      // if there are replaced data files, there cannot be any new row-level deletes for those data files
-      validateNoNewDeletesForDataFiles(base, startingSnapshotId, replacedDataFiles);
-    }


This should not be removed. There is a valid use case for the operation to check whether there are conflicts instead of re-sequencing data files. If you want, you can add a configuration method to enable/disable the validation.

rdblue · 2021-09-30T20:30:38Z

api/src/main/java/org/apache/iceberg/RewriteFiles.java

+   * @param sequenceNumber a sequenceNumber
+   * @return this for method chaining
+   */
+  RewriteFiles setSequenceNumber(long sequenceNumber);


I think we need better documentation for what this method does and when you would call it, since this affects correctness.

rdblue · 2021-09-30T20:56:55Z

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

      try (ManifestListWriter writer = ManifestLists.write(
-          ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, sequenceNumber)) {
+          ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId,
+          operation().equals(DataOperations.REPLACE) && sequenceNumber() != null ? sequenceNumber() : sequenceNumber)) {


I think this is a clever way to fix the problem. What we had considered before is setting the sequence number of individual files rather than the sequence number used for the manifest list. This is an easier way to set the sequence number for all new data files in the snapshot.

I need to think about whether this is the right approach a bit more. As long as we have a static sequence number, using inheritance is only a convenience. We could set that static sequence number on the individual data or delete files that we add in the commit. I tend to lean toward that solution because it minimizes the places that use the overridden sequence number -- so we know that the manifest list and manifests all have the latest sequence number that is assigned to the snapshot rather than also being re-sequenced.

I'll think about the trade-off some more.

rdblue · 2021-09-30T20:57:15Z

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

+   *
+   * @return a string operation
+   */
+  protected Long sequenceNumber() {


I think this needs a more specific name, like sequenceNumberOverride

rdblue · 2021-09-30T20:58:07Z

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java


+  @Override
+  protected Long sequenceNumber() {
+    return replaceSequenceNumber;


With this approach, I think we need a validation that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.

The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.

hameizi · 2021-10-11T06:53:48Z

@rdblue I commit one new commit for your suggestion, can you help review one more time? The fix detail as follows:
1.add one configuration to config whether vaildate there is new delete files be added when compacting files.
2.change name sequenceNumber to sequenceNumberOverride
3.update documentation for method setSequenceNumber

With this approach, I think we need a validation that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.

The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.

hameizi · 2021-10-18T08:33:19Z

@rdblue @openinx Could you help take a look again?

rdblue · 2021-10-19T15:51:47Z

Running CI.

hameizi · 2021-10-25T03:27:05Z

@rdblue Hello, is there any progress ?

rdblue · 2021-10-25T15:38:59Z

I've been thinking about this case and I think that the right way to do this is to set the sequence number on individual files rather than at the snapshot level. I don't think that we should change the sequence number of the snapshot or manifest list. We should just set the sequence number of individual data files. Basically, I agree with @yyanyy's comment:

I think there are two seqNum concepts here: seqNum for the table/commit and seqNum for the file. I think it's a reasonable approach to mark the rewritten files with the old seqNum, but I'm not sure if we necessarily need to use an old sequence number for the commit since they are stored as part of the snapshot, and suddenly have an old seqNum within the snapshot could be confusing even if there's no other implication.

I think we need to set the file sequence numbers. That raises the question: what do we set them to? Ideally, we would use the latest, but the delete file commit has claimed that number so we need to go with the sequence number that is less than any other commits. That would be the sequence number that was current when rewrite operation started. It would be nice to find a way around reusing a sequence number from a different snapshot, but I don't see a good way to do that right now. We can possibly fix that up later by skipping sequence numbers.

hameizi · 2021-10-26T02:33:11Z

@rdblue I think import file sequence numbers will cause unnecessary complexity, and the sequence number of manifest list can mean file commit sequence, the data file sequence number just describe different sequence of snapshot and data file, but manifest list sequence numbers can do this too, because there is snapshotid in manifest list file ,so wo don't need hold same sequence numbers for one snapshot in the snapshot and manifest list. So even if we import file sequence numbers it just cause the same effect as the sequence number of manifest list.

…napshot

rdblue · 2021-11-07T23:22:48Z

@hameizi, can you help review and test #3480? That's an alternative approach to what you're doing here that sets the sequence number per data file. I think that change is actually really important. While I was reviewing this, I thought that it was probably not a good idea to set the sequence number by reusing inheritance. Now that we've thought through the use case more, we've come up with a good reason not to: it makes it so we can't recover the sequence number where files were added, not just the sequence number where the data lives in time.

Handle the case that RewriteFiles and RowDelta commit the transaction…

2e3a1ec

… at the same time

github-actions bot added API core labels Sep 29, 2021

fix test NPE

39b5d42

hameizi added 2 commits September 29, 2021 16:38

fix test and code style

f68a5e5

fix test and code style

191cccb

rdblue reviewed Sep 30, 2021

View reviewed changes

hameizi added 4 commits October 11, 2021 11:27

some fix

ad660e5

some fix

d1d3da4

some fix

28aae83

fix test

f9d5bbb

fix new rewriteaction

79d6712

github-actions bot added the spark label Oct 20, 2021

guarantee rewrite snapshot's snapshotId and seqnumber from the same s…

3502646

…napshot

rdblue closed this Nov 7, 2021

hameizi mentioned this pull request Nov 8, 2021

Core: support rewrite data files with starting sequence number #3480

Merged

Handle the case that RewriteFiles and RowDelta commit the transaction… #3204

Handle the case that RewriteFiles and RowDelta commit the transaction… #3204

Uh oh!

Conversation

hameizi commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hameizi commented Sep 29, 2021

Uh oh!

hameizi commented Sep 29, 2021

Uh oh!

jackye1995 commented Sep 30, 2021

Uh oh!

hameizi commented Sep 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi commented Oct 11, 2021

Uh oh!

hameizi commented Oct 18, 2021

Uh oh!

rdblue commented Oct 19, 2021

Uh oh!

hameizi commented Oct 25, 2021

Uh oh!

rdblue commented Oct 25, 2021

Uh oh!

hameizi commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdblue commented Nov 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hameizi commented Sep 29, 2021 •

edited

Loading

hameizi commented Oct 26, 2021 •

edited

Loading