Core: support rewrite data files with starting sequence number #3480

jackye1995 · 2021-11-06T02:43:36Z

Add new hook in RewriteFiles snapshot update to allow accepting a sequence number that is used for all new data files. That will force the manifest writer to produce ManifestFile with the provided sequence number instead of -1.

Also add a new property in RewriteDataFiles to use this feature through config use-starting-sequence-number. When enabled, the sequence number when compaction starts will be used for commit.

This whole mechanism solves the issue today in CDC where compaction has conflicts with new equality delete files. With this change, RewriteFiles can go through as long as the newly added data files don't have new position deletes.

jackye1995 · 2021-11-06T02:44:04Z

@rdblue @RussellSpitzer @aokolnychyi @puneetzaroo @Reo-LEI @openinx

rdblue · 2021-11-07T22:51:19Z

api/src/main/java/org/apache/iceberg/RewriteFiles.java

+   * @param sequenceNumber a sequence number
+   * @return this for method chaining
+   */
+  RewriteFiles overrideSequenceNumberForNewDataFiles(long sequenceNumber);


I think there is probably a shorter, more descriptive name for this. Something like commitAtSequenceNumber?

The reasons I used this long name are:

this is not actually committing with the given sequence number, the overall sequence number still increments, but the sequence number provided is used in the manifest entry of the newly added data files.

I was thinking maybe we will need to do something similar for delete files, that's why there is a suffix of ForNewDataFiles.

Good points about naming, but this still seems long to me. Maybe we should pass the sequence number in with the data files then?

We could add rewriteFiles(Iterable<DataFile> toRemove, Iterable<DataFile> toAdd, long sequenceNumber). I think that's a bit more clear because you don't remove files at a sequence number. We also don't have the problem that it is confusing for delete files because it is the data file method only.

This looks like a fairly clean way to configure everything now that I'm looking at context. Up to you whether to use rewriteFiles to pass it or a new method, but I think we should come up with a shorter name if we go with the new method approach. What about dataFileSequenceNumber?

yes I think the new rewriteFiles method sounds good to me, let me update with that.

rdblue · 2021-11-07T22:51:37Z

api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java

+   * Defaults to false.
+   */
+  String USE_STARTING_SEQUENCE_NUMBER = "use-starting-sequence-number";
+  boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT = false;


Is there a reason why we wouldn't use this as the default?

+1. And I have some concerns. In what situation do users need to use the new sequence number when commit the rewritten data file instead of using the starting sequence number? Maybe this configuration is not necessary?

The reason I used false here is because otherwise it changes the existing behavior that has been released. Are we allowed to change this behavior?

I think it is fine to change this behavior. It doesn't affect correctness and it will avoid conflicts. That's a win. I also can't imagine people relying on the behavior of sequence numbers at this level.

sounds good, I will update

rdblue · 2021-11-07T22:52:39Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

        validateNoNewDeletesForDataFiles(
            base, startingSnapshotId, conflictDetectionFilter,
-            deletedDataFiles, caseSensitive);
+            deletedDataFiles, caseSensitive, false);


It isn't clear what's happening here. If you're introducing a new method argument, can you add an override so that this doesn't need to change? If you think that the boolean is needed, then can you add an inline comment to explain what you're setting?

rdblue · 2021-11-07T22:54:49Z

core/src/main/java/org/apache/iceberg/ManifestWriter.java

    return writer.length();
  }

+  void useSequenceNumber(long sequenceNumber) {


All of the other configuration is passed in when creating the writer. Is there a reason not to do that here? The benefit of doing that is that the sequence number would not be mutable. So you wouldn't be able to do something like this:

ManifestWriter<DataFile> writer = new ManifestWriter(...); writer.add(dataFile1); writer.setSequenceNumber(N); writer.add(dataFile2);

It isn't really clear what the correct behavior for that code would be.

Also, if the sequence number was fixed at create time, then we could write the sequence number into every file that doesn't have a sequence number instead of using inheritance. I think that we prefer setting the sequence number on entries rather than inheriting. And also the manifest itself could have the correct sequence number when it was added.

If we always have the correct sequence number for manifests themselves, then I think we get what we wanted with the proposal to use two sequence numbers. The manifest sequence number is always when the ADDED files were actually added. The file sequence number is always when the the is in time.

I think the cleanest way to do this is to add a new add method:

/** * Add an added entry for a file at a specific sequence number. * <p> * The entry's snapshot ID will be this manifest's snapshot ID. * * @param addedFile a data file * @param sequenceNumber sequence number for the data file */ public void add(F addedFile, long sequenceNumber) { addEntry(reused.wrapAppend(snapshotId, sequenceNumber, addedFile)); }

sounds good, will do it through add

rdblue · 2021-11-07T23:10:37Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+      DeleteFile[] deleteFiles = deletes.forDataFile(startingSequenceNumber, dataFile);
+      if (ignoreEqualityDeletes) {
+        ValidationException.check(Arrays.stream(deleteFiles)
+                .noneMatch(deleteFile -> deleteFile.content() == FileContent.POSITION_DELETES),


Can you wrap the line after check? I think it looks strange to have a double indent because the first part of the line wasn't wrapped.

rdblue · 2021-11-07T23:17:57Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

  protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId,
                                                  Expression dataFilter, Iterable<DataFile> dataFiles,
-                                                  boolean caseSensitive) {
+                                                  boolean caseSensitive, boolean ignoreEqualityDeletes) {


I think this makes sense for the internal API, but I generally prefer new method names for booleans, like validateNoNewPositionDeletesForDataFiles and validateNoNewDeletesForDataFiles.

Maybe we should add some comment to indicate why we could ignore the equality deletes and when we should ignore or not.

Added a new method and moved this to a private method, please let me know if that is enough

hameizi · 2021-11-08T02:34:08Z

core/src/main/java/org/apache/iceberg/ManifestWriter.java

    long minSeqNumber = minSequenceNumber != null ? minSequenceNumber : UNASSIGNED_SEQ;
    return new GenericManifestFile(file.location(), writer.length(), specId, content(),
-        UNASSIGNED_SEQ, minSeqNumber, snapshotId,
+        manifestSequenceNumber, minSeqNumber, snapshotId,


It is also set sequenceNumber for manifestfile, so i think maybe it is same as #3204. As @rdblue
say in #3204 (comment) maybe we need one new property like dataSeqnumber in manifestfile.

But i also think dataSeqnumber in manifestfile will cause more complex when we read data. And it maybe will cause some problem in version compatibility.

I think there is some different between #3204 and this PR. In this PR, the specific seqNum will be set to manifest file, and manifest list file will still got a new seqNum when we commit the snapshot of rewrite. In the result, we could got the incremental seqNum from snapshot and got spercific seqNum form data files because data file will inherit seqNum from menifest file but not from snapshot. Then we can use the seqNum of the data file to verify whether there are deleted files to modify the rewritten data files.

But in #3204, the seqNum of snapshot will override by specific seqNum, and we will got two snapshot which have same seqNum. So that, the monotonicity of snapshot seqNum will be break and we can not recognize which snapshot is the new one because they have same seqNum.

@rdblue @Reo-LEI As fllows, it is the manifestFile content produce by this PR, it is also inherit old sequence_number and set it in manifestFile, what get same result as #3204.
{"manifest_path":"hdfs://ns/group/user/root/meta/hive-temp-table/iceberg_hive_catalog_test1.db/sample35/metadata/ee0fd321-8178-4631-b62b-50d6bb365bc2-m1.avro","manifest_length":6674,"partition_spec_id":0,"content":0,"sequence_number":1837,"min_sequence_number":1839,"added_snapshot_id":5384193052566473584,"added_data_files_count":1,"existing_data_files_count":0,"deleted_data_files_count":0,"added_rows_count":14,"existing_rows_count":0,"deleted_rows_count":0,"partitions":{"array":[]}}

and we will got two snapshot which have same seqNum

@Reo-LEI #3204 will not got two snapshot which have same seqNum, but just like this PR get old seqNum in manifestFile.

I think there may be some confusion here. This PR should be updated so that the manifest list and manifest files use a newly assigned sequence number and just the data file entries use the specified sequence number.

@rdblue I think @jackye1995 use the same idea as mine just because iceberg apply eq-delete by manifestfile seqnumber, so it is easy to resolve this problem by this way.

#3204 (comment)

What you worry in this comment, I think the manifest file information has snapshot id , and every snapshot has different seqnumber in metadata json file, we can recover everything as we want. So can you give an example that you worry about?

@hameizi, the snapshot and manifest list should be written with the latest sequence number so that we can track it. Re-sequenced files can override the sequence number, but we should have correct values in metadata. We will probably add another field in the manifests to inherit it.

This PR should be updated so that the manifest list and manifest files use a newly assigned sequence number and just the data file entries use the specified sequence number.

Yes I think that is the approach used in this PR

RussellSpitzer · 2021-11-08T22:24:11Z

core/src/test/java/org/apache/iceberg/TestRowDelta.java

+    Snapshot baseSnapshot = table.currentSnapshot();
+
+    // add an equality delete file
+    DeleteFile deleteFile1 = FileMetadata.deleteFileBuilder(table.spec())


We have some helpers for this (at least with data files)

newDeleteFile and newDataFile

In testTableBase

RussellSpitzer

I think I am good with this, I agree if we want a true "this was added" id we can just keep that at the manifest level and change the "sequence number" to a "deletes up to this number applied"

rdblue · 2021-11-16T17:26:18Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+        sequenceNumberForNewDataFiles != null);
+  }
+
+  protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId,


Could you copy the javadoc for the previous version? I think it's helpful to have it.

RussellSpitzer

I have one question about how this works with V1 but this looks good to me.

RussellSpitzer · 2021-11-17T21:41:37Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+
+  /**
+   * Validates that no new delete files that must be applied to the given data files have been added to the table since
+   * a starting snapshot, with the option to ignore equality deletes during the validation.


Nbd, but I would add a note here about why we want to ignore equality deletes, just so future readers could understand.

RussellSpitzer · 2021-11-17T21:47:40Z

api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java

+   * Defaults to true.
+   */
+  String USE_STARTING_SEQUENCE_NUMBER = "use-starting-sequence-number";
+  boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT = true;


Now that this is true, do we have to ignore it with V1 Tables?

Yeah I was thinking about that right now. Technically I think it has no harm for v1 tables, because the sequence number is always 0, and it is not read or written anywhere. Let me add a unit test for v1. Do you see any place this might affect v1 table?

That's all I was thinking, V1 Tables don't have sequence numbers so I just wanted to make sure they don't break if we are trying to set them.

rdblue · 2021-11-17T23:23:21Z

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java

  }

+  @Override
+  public RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber) {


Slightly unrelated concern: should we be using Set here? It seems needlessly restrictive. Plus, DataFile is an interface, so you could easily pass files that don't implement equals/hashSet and are always considered unique. Just one thing that's always made me wonder about this API.

Not a blocker for this PR though!

rdblue · 2021-11-17T23:26:46Z

Thanks, @jackye1995! Great to have this work done!

…4701) Backports (#3480) Backport of #3480

github-actions bot added API core spark labels Nov 6, 2021

rdblue reviewed Nov 7, 2021

View reviewed changes

rdblue mentioned this pull request Nov 7, 2021

Handle the case that RewriteFiles and RowDelta commit the transaction… #3204

Closed

hameizi reviewed Nov 8, 2021

View reviewed changes

RussellSpitzer reviewed Nov 8, 2021

View reviewed changes

jackye1995 requested a review from rdblue November 10, 2021 18:10

Core: support rewrite data files with starting sequence number

7ba427d

rdblue reviewed Nov 16, 2021

View reviewed changes

Jack Ye added 3 commits November 17, 2021 13:34

update based on comments

bd51a63

use primitive long for manifest writer add

05ea4fa

add javadoc

fd84ddf

jackye1995 requested a review from rdblue November 17, 2021 21:47

RussellSpitzer approved these changes Nov 17, 2021

View reviewed changes

Jack Ye added 2 commits November 17, 2021 14:00

add test for v1 compatibility

f20d20f

add javadoc to explain ignoreEqualityDeletes

95d6a79

rdblue reviewed Nov 17, 2021

View reviewed changes

rdblue approved these changes Nov 17, 2021

View reviewed changes

rdblue merged commit d6cbca0 into apache:master Nov 17, 2021

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Nov 23, 2021

Core: Support rewriting data files at a sequence number (apache#3480)

64a82e1

RussellSpitzer pushed a commit that referenced this pull request May 5, 2022

Spark 3.x: Support rewrite data files with starting sequence number (#…

bf582eb

…4701) Backports (#3480) Backport of #3480

szehon-ho mentioned this pull request Jun 22, 2022

Delete files not eventually removed if RewriteDataFile run right after delete (when using 'use-starting-sequence-number' default) #4127

Closed

singhpk234 mentioned this pull request May 19, 2025

Feature: Rollback compaction on conflict apache/polaris#1285

Merged

Core: support rewrite data files with starting sequence number #3480

Core: support rewrite data files with starting sequence number #3480

Uh oh!

Conversation

jackye1995 commented Nov 6, 2021

Uh oh!

jackye1995 commented Nov 6, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Nov 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer left a comment

jackye1995 Nov 9, 2021 •

edited

Loading

rdblue Nov 7, 2021 •

edited

Loading

hameizi Nov 8, 2021 •

edited

Loading