[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer #4679

garyli1019 · 2022-01-24T12:02:20Z

What is the purpose of the pull request

This pull request is to implement RFC-35(https://cwiki.apache.org/confluence/display/HUDI/RFC-35%3A+Make+Flink+MOR+table+writing+streaming+friendly)

Brief change log

Implement bucket index for Flink writer pipeline
Implement BucketStreamWriteFunction to write log file in a stream like fashion

Verify this pull request

This change added tests and can be verified as follows:

Added IT test for bucket stream write function.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

garyli1019 · 2022-01-24T12:06:23Z

cc: @yihua FYI

minchowang · 2022-01-25T02:13:49Z

Wow～ Great

garyli1019 · 2022-01-25T02:31:09Z

@hudi-bot run azure

danny0405 · 2022-01-27T03:24:04Z

hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java

+  public Boolean buffer(HoodieRecord<T> record, String instantTime, HoodieFlinkTable<T> table) {
+    final HoodieRecordLocation loc = record.getCurrentLocation();
+    final String fileID = loc.getFileId();
+    final String partitionPath = record.getPartitionPath();


Can we find a way to move the buffering logic into the write function ?

IMO we need to move the buffering logic into hoodie common client, then other engine could reuse it(a streaming API later). Kafka connect sink is also looking for a streaming way to write. Is there any advantage of putting the buffering logic into the write function that I am not aware of?

danny0405 · 2022-01-27T03:26:45Z

...va/org/apache/hudi/table/action/commit/delta/FlinkStreamUpsertDeltaCommitActionExecutor.java

+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.config.HoodieWriteConfig;


Guess this class can be avoided if we move the buffering logic into the write function ?

danny0405 · 2022-01-27T03:27:51Z

hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java

+      .key(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS.key())
+      .intType()
+      .defaultValue(256) // default 256 buckets per partition
+      .withDescription("Hudi bucket number per partition. Only affected if using Hudi bucket index.");


Is there any reason the default value is 256 here, seems to generate many small files for small data sets.

sure, will change it into a smaller number

If people change the BUCKET_INDEX_NUM_BUCKETS or the write function parallelism, does the hash index still work ?

danny0405 · 2022-01-27T03:31:35Z

hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java

-      pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, hoodieRecordDataStream);
+      if (OptionsResolver.isBucketIndexTable(conf)) {
+        if (!OptionsResolver.isMorTable(conf)) {
+          throw new HoodieNotSupportedException("Bucket index only support MOR table type.");


We can move the whole if ... else ... code block into the Pipelines.hoodieStreamWrite function.

vinothchandar · 2022-01-27T21:47:23Z

cc @yihua could you also please review this from the angle of making the write client abstractions more friendly

yihua · 2022-02-03T00:07:29Z

cc @yihua could you also please review this from the angle of making the write client abstractions more friendly

I'll review this.

garyli1019 · 2022-02-15T13:52:37Z

@hudi-bot run azure

garyli1019 · 2022-02-16T11:02:27Z

@hudi-bot run azure

garyli1019 · 2022-02-22T03:04:02Z

As discussed with @danny0405 , the changes in write client will be not included in this PR, because it will make the write client looks ugly. We will include those once we have a streaming API. So this PR will only include the bucket index for Flink writer. cc: @yihua

garyli1019 · 2022-02-22T03:04:33Z

@minihippo would you review this PR if you have time. Thanks~

garyli1019 · 2022-02-22T03:04:51Z

@hudi-bot run azure

garyli1019 · 2022-02-22T07:20:12Z

@hudi-bot run azure

danny0405

+1, thanks for the contribution, i have left some comments ~

hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java

danny0405 · 2022-02-22T06:18:01Z

hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java


+  public static final ConfigOption<String> INDEX_KEY_FIELD = ConfigOptions
+      .key(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD.key())
+      .stringType()


Does this key must be same same with the primary key ? Because all the changes of a key must belong to one data bucket.

this could be a subset of primary keys. e.g. primary key could be "id1,id2", this index key could be either "id1" "id2" "id1,id2".

hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

danny0405 · 2022-02-22T06:33:42Z

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

+          } else {
+            LOG.info(String.format("Adding fileID %s to the partition bucket %s.", fileID, partitionBucketId));
+            bucketToFileIDMap.put(partitionBucketId, fileID);
+          }


The bucketToFileIDMap seems never be cleared, is there possibility that this map be put into the state ?

yes, this could be store in the state, but we need to think about the consistency issue between the state and actual file system view. Any diff could lead to incorrect data.

The checkpoint can keep the correctness i think.

hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java

danny0405 · 2022-02-22T06:39:47Z

hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java

+      .key(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS.key())
+      .intType()
+      .defaultValue(256) // default 256 buckets per partition
+      .withDescription("Hudi bucket number per partition. Only affected if using Hudi bucket index.");


If people change the BUCKET_INDEX_NUM_BUCKETS or the write function parallelism, does the hash index still work ?

loukey-lj · 2022-02-22T08:55:32Z

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

+    final int bucketNum = BucketIdentifier.getBucketId(hoodieKey, indexKeyFields, hiveBucketNum);
+    final String partitionBucketId = BucketIdentifier.partitionBucketIdStr(hoodieKey.getPartitionPath(), bucketNum);
+
+    if (bucketToFileIDMap.containsKey(partitionBucketId)) {


Partition changes are not supported？

you can change the job parallelism, but you can't change the bucket index number at this point.

I guess @loukey-lj want to address that when the record switches to new partition, how we send a delete record to the old partition ?

garyli1019 · 2022-02-24T12:34:22Z

@danny0405 If people change the write parallelism, it will still work because we load the parallelism-bucketID mapping at the runtime, but the parallelism should be less than the bucket number to avoid empty task.
At this point, the bucket number could not be changed. @minihippo is working on to support changing this number.

garyli1019 · 2022-02-24T12:50:22Z

@hudi-bot run azure

garyli1019 · 2022-02-25T07:42:28Z

@hudi-bot run azure

yangxiao0320 · 2022-02-25T07:42:55Z

This is an automatic reply, confirming that your e-mail was received, I will get back to you ASAP.Thank you ! Allen

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

minihippo · 2022-02-25T15:28:10Z

hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java

+        table.getMetaClient().getBasePath()));
+
+    // Iterate through all existing partitions to load existing fileID belongs to this task
+    List<String> partitions = table.getMetadata().getAllPartitionPaths();


It may have poor performance for application starting when the partition num is huge. Load it at runtime may be better, especially in the case most of the partitions in the table are frozen.

let's do the optimization part in a separate PR.

Thanks, can we fire an issue to address this improvement ?

https://issues.apache.org/jira/browse/HUDI-3539

fix it with #5093

garyli1019 · 2022-02-28T12:42:02Z

@hudi-bot run azure

danny0405 · 2022-03-01T05:05:29Z

hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java


  public static DataStream<Object> hoodieStreamWrite(Configuration conf, int defaultParallelism, DataStream<HoodieRecord> dataStream) {
-    WriteOperatorFactory<HoodieRecord> operatorFactory = StreamWriteOperator.getFactory(conf);
-    return dataStream


Can we have a separate method for hash index and not modify these two methods ? The methods are already too complex i think.

man you forgot your previous comment lol. It used to be separated and you suggested to put them into hoodieStreamWrite. Let me know if I misunderstood.

Yeah, if we can make both the code in Pipelines and HoodieTableSink clean ~

hudi-bot · 2022-03-01T10:09:30Z

CI report:

3340d3a Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405

+1, thanks for the contribution ~

minchowang · 2022-03-02T12:20:33Z

🆒 I will try to this the PR soon. thanks for the bigolds. 🙏

wxplovecc · 2022-03-03T06:37:50Z

I have tryed and got an Exception:

Caused by: java.util.NoSuchElementException: No value present in Option
	at org.apache.hudi.common.util.Option.get(Option.java:88) ~[hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
	at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:116) ~[hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
	at org.apache.hudi.io.FlinkMergeHandle.<init>(FlinkMergeHandle.java:70) ~[hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
	at org.apache.hudi.client.HoodieFlinkWriteClient.getOrCreateWriteHandle(HoodieFlinkWriteClient.java:485) ~[hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:142) ~[hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]

@garyli1019
maybe we should not set instantTime to U when the fileId is created before checkpoint

garyli1019 · 2022-03-03T13:23:36Z

@wxplovecc hi, this issue looks like not related to this PR, would you submit an issue then we can take a look.

danny0405 · 2022-03-03T14:10:24Z

@wxplovecc hi, this issue looks like not related to this PR, would you submit an issue then we can take a look.

Seems a bug 🐛 of the pr

…e#4679) * Support bucket index in Flink writer * Use record key as default index key

yangxiao0320 · 2022-10-11T08:35:35Z

This is an automatic reply, confirming that your e-mail was received, I will get back to you ASAP.Thank you ! Allen

garyli1019 marked this pull request as ready for review January 24, 2022 12:05

garyli1019 assigned danny0405 Jan 24, 2022

garyli1019 requested a review from danny0405 January 24, 2022 12:06

garyli1019 force-pushed the rfc-35 branch from d4f6e12 to 919d429 Compare January 25, 2022 02:37

garyli1019 changed the title ~~[RFC-35] Make Flink writer stream friendly~~ [HUDI-2450] RFC-35 Make Flink writer stream friendly Jan 25, 2022

garyli1019 force-pushed the rfc-35 branch from 919d429 to 7810bd0 Compare January 25, 2022 02:47

garyli1019 changed the title ~~[HUDI-2450] RFC-35 Make Flink writer stream friendly~~ [HUDI-3315] RFC-35 Make Flink writer stream friendly Jan 25, 2022

garyli1019 force-pushed the rfc-35 branch from 7810bd0 to 4d49dbe Compare January 26, 2022 09:40

vinothchandar added the rfc Request for comments label Jan 26, 2022

danny0405 reviewed Jan 27, 2022

View reviewed changes

yihua self-assigned this Feb 3, 2022

garyli1019 force-pushed the rfc-35 branch from 4d49dbe to f63a075 Compare February 15, 2022 13:50

xushiyan assigned vinothchandar Feb 16, 2022

garyli1019 force-pushed the rfc-35 branch from f63a075 to ddbf502 Compare February 16, 2022 11:01

garyli1019 force-pushed the rfc-35 branch from ddbf502 to 48b162c Compare February 22, 2022 03:01

[HUDI-3315]Support bucket index in Flink writer

2e89b13

garyli1019 force-pushed the rfc-35 branch from 48b162c to 2e89b13 Compare February 22, 2022 07:17

garyli1019 changed the title ~~[HUDI-3315] RFC-35 Make Flink writer stream friendly~~ [HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer Feb 22, 2022

danny0405 requested changes Feb 22, 2022

View reviewed changes

loukey-lj reviewed Feb 22, 2022

View reviewed changes

Address comment

96cbbb8

minihippo reviewed Feb 25, 2022

View reviewed changes

address comments

8120f64

danny0405 reviewed Mar 1, 2022

View reviewed changes

use record key as default index key

3340d3a

danny0405 approved these changes Mar 2, 2022

View reviewed changes

danny0405 merged commit 10d866f into apache:master Mar 2, 2022

garyli1019 deleted the rfc-35 branch March 3, 2022 13:22

vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022

[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer (apach…

4d287b3

…e#4679) * Support bucket index in Flink writer * Use record key as default index key

stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022

[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer (apach…

fe4aefd

…e#4679) * Support bucket index in Flink writer * Use record key as default index key

[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer #4679

[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer #4679

Uh oh!

Conversation

garyli1019 commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

garyli1019 commented Jan 24, 2022

Uh oh!

minchowang commented Jan 25, 2022

Uh oh!

garyli1019 commented Jan 25, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinothchandar commented Jan 27, 2022

Uh oh!

yihua commented Feb 3, 2022

Uh oh!

garyli1019 commented Feb 15, 2022

Uh oh!

garyli1019 commented Feb 16, 2022

Uh oh!

garyli1019 commented Feb 22, 2022

Uh oh!

garyli1019 commented Feb 22, 2022

Uh oh!

garyli1019 commented Feb 22, 2022

Uh oh!

garyli1019 commented Feb 22, 2022

Uh oh!

danny0405 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garyli1019 commented Feb 24, 2022

Uh oh!

garyli1019 commented Feb 24, 2022

Uh oh!

garyli1019 commented Feb 25, 2022

Uh oh!

garyli1019 commented Jan 24, 2022 •

edited

Loading