[HUDI-6098] Use bulk insert prepped for the initial write into MDT. #8493

prashantwason · 2023-04-18T19:51:50Z

[HUDI-6098] Use bulk insert prepped for the initial write into MDT.

Change Logs

Added a flag to HoodieTableMetadataWriter.commit to specify if the commit is an initial commit
For initial commit, bulkInsertPrepped API is used
Added a partitioner for MDT bulk insert which partitions the records based on their file Group. Since the records are already tagged before calling commit, this partitioner can retrieve the fileID and partition from the current location of the record.

Impact

Massive increase in read performance after initial creation of a index.
Reduces the large read/write IO requirement for the first compaction in MDT.
Reduces the duplicate storage of initial log files keeping the redundant initial commit data until cleaned.
Faster initial commit as bulkInsert is more performant for billions of records than upsert which has a workload profiling stage.

Risk level (write none, low medium or high below)

None

Already covered by existing unit tests.

Documentation Update

None

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2023-04-19T04:28:00Z

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

+  public static int getFileGroupIndexFromFileId(String fileId) {
+    // 0.10 version MDT code added -0 (0th fileIndex) to the fileID
+    int endIndex = fileId.endsWith("-0") ? fileId.length() - 2 : fileId.length();
+    int fromIndex = fileId.lastIndexOf("-", endIndex);


Can we abstract this code as separate method:

// 0.10 version MDT code added -0 (0th fileIndex) to the fileID int endIndex = fileId.endsWith("-0") ? fileId.length() - 2 : fileId.length()

danny0405

+1

danny0405 · 2023-04-20T09:57:43Z

Oops, code conflicts with your previous change.

prashantwason · 2023-04-20T16:34:20Z

Rebased and fixed conflict.

danny0405 · 2023-04-21T10:43:19Z

@hudi-bot run azure

prashantwason · 2023-04-27T05:43:35Z

@hudi-bot run azure

hudi-bot · 2023-04-27T11:26:28Z

CI report:

6d9d24f Azure: FAILURE Azure: FAILURE Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405 · 2023-04-30T03:38:32Z

@prashantwason You need to rebase with the latest master to get the tests passed.

nsivabalan · 2023-04-30T16:53:35Z

...-client/src/main/java/org/apache/hudi/metadata/SparkHoodieMetadataBulkInsertPartitioner.java

+ *
+ * This partitioner requires the records to be already tagged with location.
+ */
+public class SparkHoodieMetadataBulkInsertPartitioner implements BulkInsertPartitioner<JavaRDD<HoodieRecord>> {


do we have UT for this ?

nsivabalan · 2023-04-30T17:10:16Z

...-client/src/main/java/org/apache/hudi/metadata/SparkHoodieMetadataBulkInsertPartitioner.java

+
+    // Partition the records by their file group
+    JavaRDD<HoodieRecord> partitionedRDD = records
+        // key by <file group index, record key>. The file group index is used to partition and the record key is used to sort within the partition.


from what I glean, partitioning is based on fileGroupIndex disregarding the MDT partition. So, tell me something.
if we have 2 file groups in col stats and 2 file groups for RLI, does 1st file group for both col stats and RLI belong to same partition in this repartition call ?

should the partitioning be based on fileId itself and sorting within that can be based on the record keys within each partition. or am I missing something ?

reason being, the argument "JavaRDD records " to this method could contain records for N no of partitions in MDT. not sure if we are making any assumptions on that.

nsivabalan · 2023-04-30T17:13:33Z

...-client/src/main/java/org/apache/hudi/metadata/SparkHoodieMetadataBulkInsertPartitioner.java

+        fileIds.add(fileID);
+      } else {
+        // Empty partition
+        fileIds.add("");


can you help me understand when we might hit this ?

nsivabalan · 2023-04-30T17:16:07Z

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

+   * @param index Index of the file group within the partition
+   * @return The fileID
+   */
+  public static String getFileIDForFileGroup(MetadataPartitionType metadataPartition, int index) {


do we have UTs for these.

nsivabalan · 2023-04-30T17:17:34Z

...-client/src/main/java/org/apache/hudi/metadata/SparkHoodieMetadataBulkInsertPartitioner.java

+          int fileGroupIndex = HoodieTableMetadataUtil.getFileGroupIndexFromFileId(r.getCurrentLocation().getFileId());
+          return new Tuple2<Integer, String>(fileGroupIndex, r.getRecordKey());
+        })
+        .repartitionAndSortWithinPartitions(new FileGroupPartitioner(), keyComparator)


we know the total partitions is going to be equal to total file groups. can we override the "numPartitions" for FileGroupPartitioner?

nsivabalan · 2023-05-04T20:49:25Z

synced up directly. I am ok with the assumption that, we will initialize one metadata partition at a time.
Lets see if we can address other feedback.

prashantwason · 2023-05-23T20:04:46Z

Closing this as I have added the changes in another PR: #8684

prashantwason requested a review from nsivabalan April 18, 2023 19:53

danny0405 reviewed Apr 19, 2023

View reviewed changes

danny0405 approved these changes Apr 19, 2023

View reviewed changes

danny0405 self-assigned this Apr 19, 2023

danny0405 added metadata engine:spark Spark integration engine:flink Flink integration labels Apr 19, 2023

prashantwason force-pushed the pw_mdt_bulk_insert branch from 44bee77 to f76d8be Compare April 20, 2023 07:15

[HUDI-6098] Use bulk insert prepped for the initial write into MDT.

6d9d24f

prashantwason force-pushed the pw_mdt_bulk_insert branch from f76d8be to 6d9d24f Compare April 20, 2023 16:34

nsivabalan requested changes Apr 30, 2023

View reviewed changes

nsivabalan self-assigned this May 2, 2023

nsivabalan added release-0.14.0 priority:critical Production degraded; pipelines stalled labels May 2, 2023

prashantwason closed this May 23, 2023

[HUDI-6098] Use bulk insert prepped for the initial write into MDT. #8493

[HUDI-6098] Use bulk insert prepped for the initial write into MDT. #8493

Uh oh!

Conversation

prashantwason commented Apr 18, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 left a comment

Choose a reason for hiding this comment

Uh oh!

danny0405 commented Apr 20, 2023

Uh oh!

prashantwason commented Apr 20, 2023

Uh oh!

danny0405 commented Apr 21, 2023

Uh oh!

prashantwason commented Apr 27, 2023

Uh oh!

hudi-bot commented Apr 27, 2023

CI report:

Uh oh!

danny0405 commented Apr 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsivabalan commented May 4, 2023

Uh oh!

prashantwason commented May 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants