Flink: Commit both data files and delete files to iceberg transaction. #1939

openinx · 2020-12-16T09:44:50Z

No description provided.

openinx · 2020-12-16T14:30:48Z

flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java

+    } else {
+      // To be compatible with iceberg format V2.
+      for (Map.Entry<Long, WriteResult> e : pendingResults.entrySet()) {
+        // We don't commit the merged result into a single transaction because for the sequential transaction txn1 and


I will provide an unit test to address it.

I've addressed this case in this unit test here.

rdblue · 2020-12-17T01:44:11Z

flink/src/main/java/org/apache/iceberg/flink/sink/DeltaManifests.java

-import org.apache.iceberg.ManifestFiles;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.jetbrains.annotations.NotNull;


Is this used?

Em, this could be removed now.

rdblue · 2020-12-17T01:45:52Z

core/src/main/java/org/apache/iceberg/io/WriteResult.java

      return this;
    }

+    public Builder add(Iterable<WriteResult> results) {


Typically, we would follow the Java collection convention and use addAll.

OK, rename it to addAll sound great to me.

rdblue · 2020-12-17T01:52:05Z

flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java

+  // The completed files cache for current checkpoint. Once the snapshot barrier received, it will be flushed to the
  // 'dataFilesPerCheckpoint'.
-  private final List<DataFile> dataFilesOfCurrentCheckpoint = Lists.newArrayList();
+  private final List<WriteResult> writeResultsOfCurrentCkpt = Lists.newArrayList();


Is it correct for this to be a list of write results if a write result keeps track of a list of data files and a list of delete files?

Yes, it's correct here. Because if there're 5 IcebergStreamWriter, then each writer will emit a WriteResult. For the one parallelism IcebergFilesCommitter, it will collect all the WriteResult(s) in this writeResultsOfCurrentCkpt cache, and then merge them into a single WriteResult. Finally, write those files into delete + data manifests and update the flink statebackend.

rdblue · 2020-12-17T01:57:53Z

Looks good overall, thought I didn't look into the tests very thoroughly. Since this is getting into quite a bit of Flink logic, I'd appreciate it if @JingsongLi and @stevenzwu could also take a look and review.

openinx · 2020-12-17T08:04:55Z

flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java

-      ManifestFile manifestFile =
-          SimpleVersionedSerialization.readVersionAndDeSerialize(FlinkManifestSerializer.INSTANCE, manifestData);
+      DeltaManifests deltaManifests =
+          SimpleVersionedSerialization.readVersionAndDeSerialize(DeltaManifestsSerializer.INSTANCE, e.getValue());


We will need to maintain the flink state's compatibility. If the encoding version is 1, then we should use the FlinkManifestSerializer way to read the byte[].

…ts in the lastest sucessful checkpoint.

stevenzwu · 2020-12-17T17:23:21Z

flink/src/main/java/org/apache/iceberg/flink/sink/DeltaManifests.java

  @Override
-  public ManifestFile deserialize(int version, byte[] serialized) throws IOException {
-    return ManifestFiles.decode(serialized);
+  public Iterator<ManifestFile> iterator() {


does this need to extend from Iterable? It seems only needed for using Iterables.addAll(manifests, deltaManifests);. is it simpler to directly to cal the two getters?

OK, Agreed we don't have to introduce the complex Iterable.

stevenzwu · 2020-12-17T17:29:37Z

flink/src/main/java/org/apache/iceberg/flink/sink/FlinkManifestUtil.java

+      deleteManifest = deleteManifestWriter.toManifestFile();
+    }
+
+    return new DeltaManifests(dataManifest, deleteManifest);


do we need to check if WriteResult is empty (no data and delete files)?

We have a similar discussion here. Even if the WriteResult is empty ( NOT null, null means there's nobody emitted a result to the IcebergFilesCommitter, while empty WriteResult means the IcebergStreamWriter did not write any new data but still emit a WriterResult with zero data files and zero delete files to downstream IcebergFilesCommitter), we'd better to commit to iceberg txn so that the flink streaming job won't be failure easily when expiring a old snapshot (since that time we did not even write any new records).

About this question, I think we'd better to keep the dummy DeltaManifests in state , although it has no delete files and data files.

stevenzwu · 2020-12-17T17:33:01Z

flink/src/main/java/org/apache/iceberg/flink/sink/FlinkManifestUtil.java

    return new ManifestOutputFileFactory(ops, table.io(), table.properties(), flinkJobId, subTaskId, attemptNumber);
  }
+
+  static DeltaManifests writeCompletedFiles(WriteResult result,


just for my own education, referencedDataFiles from WriteResult doesn't seem to be used (except for unit test). What is it for? do we need to serialize it too?

We should serialize it and add it to the commit. This is the set of files that is referenced by any positional delete, which identifies deleted rows by file and row position. The commit will validate that all of the files still exist in the table.

This isn't strictly needed for this use case because we know that the position deletes only refer to files that are created in this commit. Since the files are being added in the commit, it isn't possible for some other process to delete some of them from metadata. But it is still good to configure the commit properly in case this gets reused later.

Thanks for the explanation, @rdblue . I think it's correct to validate the data files in RowDelta#commit. Will provide an extra unit test to address it.

This unit test addressed the data files validation issue

stevenzwu · 2020-12-17T23:25:54Z

flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java


-    commitOperation(appendFiles, numFiles, "append", newFlinkJobId, checkpointId);
+      int numFiles = 0;
+      for (WriteResult result : pendingResults.values()) {


We are using this API from AppendFiles interface. When we had an extended outage and accumulated a few hundreds of transactions/manifests in Flink checkpoint, this help avoiding rewrite of those manifest files. Otherwise, commit can take very long. @rdblue can probably explain it better than I do.

AppendFiles appendManifest(ManifestFile file);

here we are merging data files potentially from multiple checkpoint cycles/manifests into a single manifest file. Maybe we can add a similar API in DeleteFiles interface?

DeleteFiles deleteManifest(ManifestFile file);

It sounds like a separate improvement , so I created an issue for this , let's discuss there, #1959.

Maybe we can add a similar API in DeleteFiles interface?

We don't currently do this because we need delete entries to exist when we delete files. That way we can track when something was deleted and clean it up incrementally in ExpireSnapshots. If we did have a method like this, it would always rewrite the manifest with deletes, or would need to ensure that the manifest that is added contains only deletes, and these requirements are not very obvious. I think it is better to pass the deleted files through the existing methods.

openinx · 2020-12-18T10:13:17Z

@stevenzwu Thanks for your reviewing, I addressed all the things except the separate issue #1959. any other concerns ?

stevenzwu

LGTM. thanks for opening the jira to track the append manifest file

rdblue · 2020-12-18T17:56:51Z

+1

It would be great to have a review from @JingsongLi as well, but I'm going to go ahead and commit this since it looks good to @stevenzwu.

openinx · 2020-12-21T02:29:45Z

@stevenzwu @rdblue Thanks for the reviewing and merging, @JingsongLi is currently busy for internal flink/blink development work, so he may not have the time to do the double-check now.

Flink: Commit both data files and delete files to iceberg transaction.

14c028b

github-actions bot added core flink labels Dec 16, 2020

Minor changes.

b5d2b09

openinx commented Dec 16, 2020

View reviewed changes

rdblue reviewed Dec 17, 2020

View reviewed changes

rdblue requested a review from JingsongLi December 17, 2020 01:57

Addressing the comments.

c007165

openinx commented Dec 17, 2020

View reviewed changes

openinx added 3 commits December 17, 2020 17:15

Add unit tests to address the state compatibility issues.

1eae61a

Minor changes.

0fdec67

Add unit tests: addressing the case that commit two failure checkpoin…

4e769b2

…ts in the lastest sucessful checkpoint.

stevenzwu reviewed Dec 17, 2020

View reviewed changes

Address the comments and add more unit tests.

0c6d008

openinx mentioned this pull request Dec 18, 2020

Flink: Commit the existing manifests to iceberg txn in IcebergFilesCommitter. #1959

Closed

stevenzwu approved these changes Dec 18, 2020

View reviewed changes

rdblue merged commit 77c5617 into apache:master Dec 18, 2020

Flink: Commit both data files and delete files to iceberg transaction. #1939

Flink: Commit both data files and delete files to iceberg transaction. #1939

Uh oh!

Conversation

openinx commented Dec 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Dec 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx commented Dec 18, 2020

Uh oh!

stevenzwu left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Dec 18, 2020

Uh oh!

openinx commented Dec 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevenzwu Dec 17, 2020 •

edited

Loading

stevenzwu Dec 17, 2020 •

edited

Loading