Flink: Fix flink manifest location collision when there are multiple committers for multiple sink tables #3986

coolderli · 2022-01-26T07:09:01Z

When there is more than one iceberg-flink-committer on one task manager, the manifest location generated by flink will have the same location. And then it will have a conflict with each other. I think the fileCount should always increment.
There is my problem. This is rare, but we'd better fix it.
I think the correct is we should union all sources, then there will be only one iceberg-flink-committer operator.

coolderli · 2022-01-26T07:15:01Z

I found the ci failed: https://github.com/apache/iceberg/runs/4947986168?check_suite_focus=true#step:6:203
But I don't think it was caused by my changes. I can try to fix it.

stevenzwu · 2022-01-27T04:12:34Z

flink/v1.12/flink/src/main/java/org/apache/iceberg/flink/sink/ManifestOutputFileFactory.java

 import org.apache.iceberg.relocated.com.google.common.base.Strings;

 class ManifestOutputFileFactory {
+  private static final AtomicInteger fileCount = new AtomicInteger(0);


I am not sure making it static is the right fix. Instead, we probably should fix the generatePath method and include the full table name in the path.

private String generatePath(long checkpointId) { return FileFormat.AVRO.addExtension(String.format("%s/%s-%05d-%d-%d-%05d", fullTableName, flinkJobId, subTaskId, attemptNumber, checkpointId, fileCount.incrementAndGet())); }

@stevenzwu I think that including the full table name can not solve this problem because the target table may be the same one.

hmm. I didn't think about the case where target table can be the same. However, I also think the static variable may also not be enough. I think we can still get name collision when those committer operators run on different TMs. we may want to use this unique id from StreamingRuntimeContext

/** * Returned value is guaranteed to be unique between operators within the same job and to be * stable and the same across job submissions. * * <p>This operation is currently only supported in Streaming (DataStream) contexts. * * @return String representation of the operator's unique id. */ public String getOperatorUniqueID() { return operatorUniqueID; }

@stevenzwu This is the operator unique id is fine. I will fix it.

@stevenzwu I have updated the pr. Could you please review it again? Thanks.

stevenzwu · 2022-01-27T04:16:32Z

@coolderli can you create this PR for 1.14 only? we typically create a PR for the latest version. then we can create backport PR separately. In the backport PR, we show the diffs for the relevant sub folders. e.g. #3870 (comment)

coolderli · 2022-01-27T06:10:49Z

@coolderli can you create this PR for 1.14 only? we typically create a PR for the latest version. then we can create backport PR separately. In the backport PR, we show the diffs for the relevant sub folders. e.g. #3870 (comment)

@stevenzwu Thanks for reminding me about that. I have removed the modify on 1.12 and 1.13. Could you please take a look again?

stevenzwu · 2022-01-27T16:33:09Z

flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/ManifestOutputFileFactory.java

  private String generatePath(long checkpointId) {
-    return FileFormat.AVRO.addExtension(String.format("%s-%05d-%d-%d-%05d", flinkJobId, subTaskId,
-        attemptNumber, checkpointId, fileCount.incrementAndGet()));
+    return FileFormat.AVRO.addExtension(String.format("%s-%05d-%s-%d-%d-%05d", flinkJobId, subTaskId,


nit: might be more consistent if we follow the order of jobId, operatorId, subtaskId

+1. Little things like that to be more consistent make developer experience a lot smoother in the aggregate.

Agreed, I'll fix it.

stevenzwu · 2022-01-27T16:37:55Z

@coolderli This PR is almost ready. I left a nit comment.

Also we probably should update the PR description. The essence of the collision problem is that we may have multiple committers for multiple sink tables, no matter it is the same TM or different TMs.

kbendick · 2022-01-28T00:47:24Z

flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkManifestUtil.java

+    return new ManifestOutputFileFactory(ops, table.io(), table.properties(), flinkJobId, subTaskId,
+        operatorUniqueId, attemptNumber);


Nit: Can we fit all of the arguments on one line if they're placed on the next line?

return new ManifestOutputFileFactory( ops, table.io(), table.properties(), flinkJobId, subTaskId, operatorUniqueId, attemptNumber);

You could also make the new parameter into operatorUid (or operatorId) to shorten it.

Not a blocker by any means but would be nice if possible. =)

@kbendick Thanks, I'll fix it

stevenzwu · 2022-01-28T17:23:41Z

flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkManifestUtil.java

-  static ManifestOutputFileFactory createOutputFileFactory(Table table, String flinkJobId, int subTaskId,
-                                                           long attemptNumber) {
+  static ManifestOutputFileFactory createOutputFileFactory(Table table, String flinkJobId, String operatorUniqueId,
+      int subTaskId, long attemptNumber) {


nit: this indention seems not matching the current code style

Done, but I found the ci had passed. Never mind, I have fixed it.

rdblue · 2022-02-04T17:38:10Z

Looks good. Thanks for fixing this, @coolderli!

(cherry picked from commit 0d9c63e)

coolderli added 2 commits January 26, 2022 14:37

Flink: Fix manifest location conflict when exist same subtask

171d161

update

ad42b66

github-actions bot added the flink label Jan 26, 2022

coolderli changed the title ~~Flink: Fix manifest location conflict when exist same subtask~~ Flink: Fix manifest location conflict when exist more than one committer on the same taskmanager Jan 26, 2022

stevenzwu reviewed Jan 27, 2022

View reviewed changes

coolderli added 2 commits January 27, 2022 14:04

remove duplicate code on flink1.12 and flink1.13

8ebb870

remove static

dc8cd20

coolderli added 2 commits January 27, 2022 15:24

use operatorId to generate manifest path

0b47dbf

remove unused import in TestFlinkManifest

6b885a3

stevenzwu reviewed Jan 27, 2022

View reviewed changes

kbendick reviewed Jan 28, 2022

View reviewed changes

update format order

a823c00

coolderli changed the title ~~Flink: Fix manifest location conflict when exist more than one committer on the same taskmanager~~ Flink: Fix flink manifest location collision when there are multiple committers for multiple sink tables Jan 28, 2022

update params order

35c66d9

stevenzwu reviewed Jan 28, 2022

View reviewed changes

fix checkstyle in FlinkManifestUtil

67ae8fd

stevenzwu approved these changes Jan 30, 2022

View reviewed changes

rdblue approved these changes Feb 4, 2022

View reviewed changes

rdblue merged commit afa9c60 into apache:master Feb 4, 2022

rdblue added this to the Iceberg 0.13.1 Release milestone Feb 4, 2022

amogh-jahagirdar pushed a commit to amogh-jahagirdar/iceberg that referenced this pull request Feb 10, 2022

Flink: Ensure temp manifest names are unique across tasks (apache#3986)

badb8b9

amogh-jahagirdar mentioned this pull request Feb 10, 2022

0.13.1 Cherry-Picks #4087

Merged

jackye1995 pushed a commit that referenced this pull request Feb 10, 2022

Flink: Ensure temp manifest names are unique across tasks (#3986)

0d9c63e

samarthjain pushed a commit to samarthjain/incubator-iceberg that referenced this pull request Apr 6, 2022

Flink: Ensure temp manifest names are unique across tasks (apache#3986)

ebdb4ea

(cherry picked from commit 0d9c63e)

vanliu-tx pushed a commit to BKBASE-Plugin/iceberg that referenced this pull request May 11, 2022

Flink: Ensure temp manifest names are unique across tasks (apache#3986)

be36c30

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Flink: Ensure temp manifest names are unique across tasks (apache#3986)

d20fc90

		return new ManifestOutputFileFactory(ops, table.io(), table.properties(), flinkJobId, subTaskId,
		operatorUniqueId, attemptNumber);

Flink: Fix flink manifest location collision when there are multiple committers for multiple sink tables #3986

Flink: Fix flink manifest location collision when there are multiple committers for multiple sink tables #3986

Uh oh!

Conversation

coolderli commented Jan 26, 2022

Uh oh!

coolderli commented Jan 26, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Jan 27, 2022

Uh oh!

coolderli commented Jan 27, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Jan 27, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Feb 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants