[HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

wxplovecc · 2022-03-11T06:10:49Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

This pull request avoid deduplicateRecords method in FlinkWriteHelper run out of order

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

danny0405 · 2022-03-11T07:47:06Z

...nt/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkWriteHelper.java

+    if (hasInsert) {
+      recordList.get(0).getCurrentLocation().setInstantTime("I");
+    }
+    return recordList;


In line 114, we already reset the location, so each records list under the same key after reduction should have the same instant time type as before, so why the set is needed ?

I wrote a test in local and found out the order of the list was changed after reduction. <id1, id2> became <id2,id1> somehow, so it's not related to a single record.

Yes, the Map::values does not guarantee the sequence, state index based writer has no problem because it assigns the instant "I" and "U" based on the buckets of last checkpoint, and reuse the buckets within one checkpoint.

This fix is necessary for it to be more robust.

hudi-bot · 2022-03-11T08:49:59Z

CI report:

b9e437b Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

garyli1019

@wxplovecc thanks for your contribution! I can reproduce this bug. left some minor comments. we should merge this before the next release

garyli1019 · 2022-03-20T09:01:34Z

...nt/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkWriteHelper.java

  @Override
  public List<HoodieRecord<T>> deduplicateRecords(
      List<HoodieRecord<T>> records, HoodieIndex<?, ?> index, int parallelism) {
+    final boolean hasInsert = records.get(0).getCurrentLocation().getInstantTime().equals("I");


how about renaming this as isInsertBucket and add a comment to explain why we need this.

The keyedRecords can be made more efficient:

Map<Object, List<HoodieRecord<T>>> keyedRecords = records.stream() .collect(Collectors.groupingBy(record -> record.getKey().getRecordKey()))

garyli1019 · 2022-03-20T09:03:45Z

hudi-flink/src/test/java/org/apache/hudi/sink/ITTestDataStreamWrite.java

+    JobClient client = execEnv.executeAsync(execEnv.getStreamGraph());
+    if (client.getJobStatus().get() != JobStatus.FAILED) {
+      try {
+        TimeUnit.SECONDS.sleep(20); // wait long enough for the compaction to finish


is this sleep still needed if we test for COW?

garyli1019 · 2022-03-20T09:04:28Z

hudi-flink/src/test/java/org/apache/hudi/sink/ITTestDataStreamWrite.java


+  @ParameterizedTest
+  @ValueSource(strings = {"BUCKET"})
+  public void testCopyOnWriteBucketIndex(String indexType) throws Exception {


can we use this test for the COW table? include state index as well

…eption Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a implicit constraint: all the records in one bucket should have the same bucket type(instant time here), the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint. close apache#5018

reWrite test

b9e437b

wxplovecc mentioned this pull request Mar 11, 2022

[HUDI-3559] fix flink Bucket Index with COW table type NoSuchElementException cause o… #4981

Closed

5 tasks

danny0405 reviewed Mar 11, 2022

View reviewed changes

garyli1019 self-assigned this Mar 14, 2022

nsivabalan added engine:flink Flink integration priority:high Significant impact; potential bugs labels Mar 15, 2022

wxplovecc mentioned this pull request Mar 18, 2022

[SUPPORT] [BUG] BUCKET Index with Flink 1.14 + Hudi 0.11 (master） #5061

Closed

garyli1019 reviewed Mar 20, 2022

View reviewed changes

garyli1019 added priority:blocker Production down; release blocker and removed priority:high Significant impact; potential bugs labels Mar 20, 2022

danny0405 closed this in 26e5d2e Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

[HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

Uh oh!

wxplovecc commented Mar 11, 2022 •

edited

Loading

Uh oh!

danny0405 Mar 11, 2022

Uh oh!

garyli1019 Mar 20, 2022

Uh oh!

danny0405 Mar 21, 2022

Uh oh!

hudi-bot commented Mar 11, 2022

Uh oh!

garyli1019 left a comment

Uh oh!

garyli1019 Mar 20, 2022

Uh oh!

danny0405 Mar 21, 2022

Uh oh!

garyli1019 Mar 20, 2022

Uh oh!

garyli1019 Mar 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[HUDI-3559] fix flink Bucket Index with COW table type NoSuchElementException cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

[HUDI-3559] fix flink Bucket Index with COW table type NoSuchElementException cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

Uh oh!

Conversation

wxplovecc commented Mar 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

danny0405 Mar 11, 2022

Choose a reason for hiding this comment

Uh oh!

garyli1019 Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

danny0405 Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Mar 11, 2022

CI report:

Uh oh!

garyli1019 left a comment

Choose a reason for hiding this comment

Uh oh!

garyli1019 Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

danny0405 Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

garyli1019 Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

garyli1019 Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

[HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of order #5018

wxplovecc commented Mar 11, 2022 •

edited

Loading