[HUDI-9622] Add implementation of MergeHandle backed by the HoodieFileGroupReader #13699

the-other-tim-brown · 2025-08-09T01:24:59Z

Change Logs

The goal of this PR is to ensure consistent behavior while reading and writing data across our Merge-on-Read and Copy-on-Write tables by leveraging the existing HoodieFileGroupReader to manage the merging of records. The FileGroupReaderBasedMergeHandle that is currently used for compaction is updated to allow merging with an incoming stream of records.

Summary of changes:

FileGroupReaderBasedMergeHandle.java is updated to allow incoming records in the form of an iterator of records directly instead of reading changes exclusively from log files. New callbacks are added to support creating the required outputs for updates to Record Level and Secondary indexes.
The merge handle is also updated to account for preserving the metadata of records that are not updated while also generating the metadata for updated records. This does not impact the compaction workflow which will preserve the metadata of the records.
The FileGroupReaderBasedMergeHandle is set as the default merge handle
New test cases are added for RLI including a test where records move between partitions and deletes are sent to partitions that do not contain the original record
The delete record ordering value is now converted to the engine specific type so there are no issues when performing comparisons

Differences between FileGroupReaderBasedMergeHandle and HoodieWriteMergeHandle

Currently the HoodieWriteMergeHandle can handle applying a single update to multiple records with the same key. This functionality does not exist in the FileGroupReaderBasedMergeHandle
The FileGroupReaderBasedMergeHandle does not support the shouldFlush functionality in the HoodieRecordMerger

Impact

Provides a unified path for handling updates to records in Hudi.

Risk level (write none, low medium or high below)

High, this is touching our core writer flows

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

the-other-tim-brown · 2025-08-09T23:36:05Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/IOUtils.java

    } else {
      mergeHandle.doMerge();
+      if (mergeHandle instanceof FileGroupReaderBasedMergeHandle) {
+        mergeHandle.close();


Open question: Is there any reason to avoid calling close on the other merge handles?

It seems we shoud just return mergeHandle.close in line 132 instead of closing here for all the merge handles.

danny0405 · 2025-08-11T03:00:52Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/WriteStatus.java

  }

+  public void manuallyTrackSuccess() {
+    this.manuallyTrackIndexUpdates = true;


Can just set up trackSuccessRecords as false here.

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java

danny0405 · 2025-08-11T03:07:42Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+   */
+  public FileGroupReaderBasedMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTable<T, I, K, O> hoodieTable,
+                                         Iterator<HoodieRecord<T>> recordItr, String partitionPath, String fileId,
+                                         TaskContextSupplier taskContextSupplier, Option<BaseKeyGenerator> keyGeneratorOpt, HoodieReaderContext<T> readerContext) {


do we need to pass around the readerContext explicitly here? can we use hoodieTable.getContext.getReaderContextFactoryForWrite instead?

The issue is that the merge handles are created on the executors in spark so the hoodieTable.getContext will always return a local engine context instead of a spark engine context when required.

always return a local engine context instead of a spark engine context when required

Can we fix that, like hoodieTable.getContextForWrite or something.

is getContextForWrite returning an EngineContext here?

yeah, not sure if it is feasible.

danny0405 · 2025-08-11T03:09:58Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+    init(operation, this.partitionPath);
+    this.props = TypedProperties.copy(config.getProps());
+    this.isCompaction = true;
+    initRecordIndexCallback();


do we even need to track RLI for compactions?

Removed this, it was causing some issues with existing tests so it was good to see we have some good coverage here :)

danny0405 · 2025-08-11T03:13:34Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+    initRecordTypeAndCdcLogger(enginRecordType);
+    init(operation, this.partitionPath);
+    this.props = TypedProperties.copy(config.getProps());
+    this.isCompaction = true;


we already have flag preserveMetadata to distinguish table service and regular write, can we continue to use that? some functions like SI tracing already relies on the flag preserveMetadata. And it seems clustering also uses this constructor.

This boolean was removed in later commits

danny0405 · 2025-08-11T03:27:28Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+  }
+
+  private void initRecordIndexCallback() {
+    if (this.writeStatus.isTrackingSuccessfulWrites()) {


The isTrackingSuccessfulWrites flag in write status comes from hoodieTable.shouldTrackSuccessRecords(), which is true when RLI or partitioned RLI is enabled, we should skip the location tracing for compaction which is redundant.

Yes, this is no longer called from the compaction path

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

danny0405 · 2025-08-11T03:37:13Z

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java

   */
  public ReaderContextFactory<?> getReaderContextFactoryForWrite(HoodieTableMetaClient metaClient, HoodieRecord.HoodieRecordType recordType,
-                                                                 TypedProperties properties) {
+                                                                 TypedProperties properties, boolean outputsCustomPayloads) {


This flag is only meaningful for avro reader context, is there anyway we can constraint it just to AvroReaderContextFactory?

I didn't find a good way right now. This flag is really representing two different stages of the writer path, the dedupe/indexing stages and the final write. In the final write, we don't want to ever use the payload based records since we just want the final indexed record representation of the record.

we have a plan to abadon the payload based records in writer path right? So this should be just a temporary solution?

We'll still need it for ExpressionPayload and for any user provided payload so it is not temporary but these restrictions may allow us to clean things up further.

danny0405 · 2025-08-11T03:40:39Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+
+    @Override
+    public void onUpdate(String recordKey, BufferedRecord<T> previousRecord, BufferedRecord<T> mergedRecord) {
+      writeStatus.addRecordDelegate(HoodieRecordDelegate.create(recordKey, partitionPath, fileRecordLocation, fileRecordLocation, mergedRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE));


do we even need to add the delete when mergedRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE is true ?

The write status will still be updated in the current code with this record delegate even though ignoreIndexUpdate is true. This is keeping parity with the old system but I am not sure of the context for this.

The flag is used when all the delegates are collected into the driver and been utilized to calcurate the RLI index items for MDT, the delegate with flag ignoreIndexUpdate as true are just been dropped directly, so there is no need to even generate and collect them.

danny0405 · 2025-08-11T03:41:07Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+
+    @Override
+    public void onInsert(String recordKey, BufferedRecord<T> newRecord) {
+      writeStatus.addRecordDelegate(HoodieRecordDelegate.create(recordKey, partitionPath, null, fileRecordLocation, newRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE));


newRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE is always false.

It's always false today but do we want to keep this in case there is some future case where it may not be the case?

but do we want to keep this in case there is some future case

I don't think so, the hoodie operation is designed to be force set up there.

danny0405 · 2025-08-11T03:41:32Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+    public void onDelete(String recordKey, BufferedRecord<T> previousRecord, HoodieOperation hoodieOperation) {
+      // The update before operation is used when a deletion is being sent to the old File Group in a different partition.
+      // In this case, we do not want to delete the record metadata from the index.
+      writeStatus.addRecordDelegate(HoodieRecordDelegate.create(recordKey, partitionPath, fileRecordLocation, null, hoodieOperation == HoodieOperation.UPDATE_BEFORE));


hoodieOperation == HoodieOperation.UPDATE_BEFORE is always false.

I simplified this in a recent commit

danny0405 · 2025-08-11T03:44:55Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/UpdateProcessor.java

+          } else {
+            Schema readerSchema = readerContext.getSchemaHandler().getRequestedSchema();
+            // If the record schema is different from the reader schema, rewrite the record using the payload methods to ensure consistency with legacy writer paths
+            if (!readerSchema.equals(recordSchema)) {


This could be super costly. can it be simplified by checking the fields number?

Checking the number of fields will not be enough to guarantee safety. This case is currently limited to the Payload based mergers where there is an update in the incoming records and there is no record in the base file for that key so it should not be very common.

danny0405 · 2025-08-11T03:53:51Z

...main/java/org/apache/hudi/common/table/read/buffer/StreamingFileGroupRecordBufferLoader.java

-  static <T> StreamingFileGroupRecordBufferLoader<T> getInstance() {
-    return INSTANCE;
+  StreamingFileGroupRecordBufferLoader(Schema recordSchema) {
+    this.recordSchema = recordSchema;


there is no need to pass around the schema explicitly, it is actually the writeSchema, which equals to:
schemaHandler.requestedSchema - metadata fields, we already have utility method for it: HoodieAvroUtils.removeMetadataFields.

Nice, this will simplify the changeset

the-other-tim-brown · 2025-08-10T20:50:10Z

...ient/hudi-client-common/src/main/java/org/apache/hudi/io/SecondaryIndexStreamingTracker.java

+   * @param writeStatus               The Write status
+   * @param secondaryIndexDefns       Definitions for secondary index which need to be updated
+   */
+  static <T> void trackSecondaryIndexStats(HoodieKey hoodieKey, Option<BufferedRecord<T>> combinedRecordOpt, @Nullable BufferedRecord<T> oldRecord, boolean isDelete,


Method mirrors the one above it but operates directly on BufferedRecord instead of converting to HoodieRecord

cool. have we added UTs for this

if we have not added UTs for this, can you track it in a follow up jira.

the-other-tim-brown · 2025-08-11T13:20:18Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+   */
+  public FileGroupReaderBasedMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTable<T, I, K, O> hoodieTable,
+                                         Iterator<HoodieRecord<T>> recordItr, String partitionPath, String fileId,
+                                         TaskContextSupplier taskContextSupplier, Option<BaseKeyGenerator> keyGeneratorOpt, HoodieReaderContext<T> readerContext) {


The issue is that the merge handles are created on the executors in spark so the hoodieTable.getContext will always return a local engine context instead of a spark engine context when required.

the-other-tim-brown · 2025-08-11T13:23:17Z

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java

   */
  public ReaderContextFactory<?> getReaderContextFactoryForWrite(HoodieTableMetaClient metaClient, HoodieRecord.HoodieRecordType recordType,
-                                                                 TypedProperties properties) {
+                                                                 TypedProperties properties, boolean outputsCustomPayloads) {


I didn't find a good way right now. This flag is really representing two different stages of the writer path, the dedupe/indexing stages and the final write. In the final write, we don't want to ever use the payload based records since we just want the final indexed record representation of the record.

the-other-tim-brown · 2025-08-11T13:24:43Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+
+    @Override
+    public void onInsert(String recordKey, BufferedRecord<T> newRecord) {
+      writeStatus.addRecordDelegate(HoodieRecordDelegate.create(recordKey, partitionPath, null, fileRecordLocation, newRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE));


It's always false today but do we want to keep this in case there is some future case where it may not be the case?

the-other-tim-brown · 2025-08-11T13:35:21Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+
+    @Override
+    public void onUpdate(String recordKey, BufferedRecord<T> previousRecord, BufferedRecord<T> mergedRecord) {
+      writeStatus.addRecordDelegate(HoodieRecordDelegate.create(recordKey, partitionPath, fileRecordLocation, fileRecordLocation, mergedRecord.getHoodieOperation() == HoodieOperation.UPDATE_BEFORE));


The write status will still be updated in the current code with this record delegate even though ignoreIndexUpdate is true. This is keeping parity with the old system but I am not sure of the context for this.

the-other-tim-brown · 2025-08-11T20:52:27Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/FileGroupRecordBuffer.java

 import static org.apache.hudi.common.table.log.block.HoodieLogBlock.HeaderMetadataType.INSTANT_TIME;

 abstract class FileGroupRecordBuffer<T> implements HoodieFileGroupRecordBuffer<T> {
+  protected final Set<String> usedKeys = new HashSet<>();


There is a possibility of duplicate keys in a file and there is an expectation that updates are applied to both rows. See Test only insert for source table in dup key without preCombineField for an example. We need to figure out if there is a better way to handle this.

I'm wondering if this is a valid case, because the duplicates in base only occurs in pkless table, while in the test case, the table has a primary key but it sets up to allow duplicates for the first commit intentionally, seems incorrect. We should limit the "allow duplicates" for incoming records to just INSERT operation for pkless table, just like the doc says in HoodieConcatHandle.

Fired a discussion here: #6824 (comment)

The behavior is incorrect, fire a JIRA to trace it: https://issues.apache.org/jira/browse/HUDI-9708, let's abadon the support for duplicates in base file and fixe it in a separate PR.

For the test case that fails without this, should we just update it to be PK-less?

we can skip the whole tests first and fix the MIT in HUDI-9708, the original test means to test pk table I think.

@danny0405 there is one more case that fails without this change: testUpsertWithoutPrecombineFieldAndCombineBeforeUpsertDisabled - I am marking it as disabled for now

danny0405 · 2025-08-13T02:02:45Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java

+          writeConfig, instantTime, table, recordItr, partitionPath, fileId, taskContextSupplier, keyGeneratorOpt, readerContext);
+    } catch (Exception e) {
+      // Fallback to legacy constructor if the new one fails
+      LOG.warn("Failed to instantiate HoodieMergeHandle with new constructor, falling back to legacy constructor: {}", e.getMessage());


the catch exception fallback is hacky, can we match the class name FileGroupReaderBasedMergeHandle.class.getName to invoke different constructors.

Another option I was considering is checking if the constructor exists first. Then users can still provide a custom merge handle with the newer or older set of constructor args

danny0405 · 2025-08-13T02:50:47Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/UpdateProcessor.java

+    @Override
+    protected BufferedRecord<T> handleNonDeletes(BufferedRecord<T> previousRecord, BufferedRecord<T> mergedRecord) {
+      try {
+        if (merger.shouldFlush(readerContext.getRecordContext().constructHoodieRecord(mergedRecord), readerContext.getRecordContext().getSchemaFromBufferRecord(mergedRecord), properties)) {


we have discussed to drop the shouldFlush functionality as of now, it is not in good design and the original fix does not cover MOR merging scenarios as mentioned here: #9809 (comment), let's just drop this support first and fire a JIRA to trace it instead.

JIRA created to trace the following-up fixes: https://issues.apache.org/jira/browse/HUDI-9709

What should we do about the failing tests? Override the write handle to use the old class?

skip the whole tests first and fix it in HUDI-9709

danny0405 · 2025-08-13T06:58:12Z

hudi-common/src/main/java/org/apache/hudi/common/engine/RecordContext.java


  private SerializableBiFunction<T, Schema, String> metadataKeyExtractor() {
-    return (record, schema) -> getValue(record, schema, RECORD_KEY_METADATA_FIELD).toString();
+    return (record, schema) -> typeConverter.castToString(getValue(record, schema, RECORD_KEY_METADATA_FIELD));


wondering why this does not error out before the patch?

This may not be needed anymore. There was a bug where someone updated the BufferedRecord creation from HoodieRecord to read the values from the data instead of from the HoodieKey so I was getting some NPEs.

the-other-tim-brown · 2025-08-13T13:28:08Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java


  private transient FileSystemViewManager viewManager;
  protected final transient HoodieEngineContext context;
+  private final ReaderContextFactory<T> readerContextFactoryForWrite;


@danny0405 I don't think this is the right way to approach this. Now every usage of HoodieTable will incur the cost of generating the factory, which requires broadcasting on Spark. Why can't we just pass this in when it is required like I had before?

The hoodie table holds all the info required there: the engine context, the meta client, and the write config. It looks not right we pass around a "reader" context factory for all the write path, even for write path that is not COW table merging scenarios.

There are also discrepancies in write path, only spark needs this factory, Flink and Java can get the factory directly from the engine context in the hoodie table.

The broadcast is already there for all the write executors before my change, if we have some manner to resolve the serialization issue of the engine context or reader context itself, that would be the best. Or if we can limit the factory only been initialized for COW table updates in write handles, that would be great.

we can remove the reader context factory in hoodie table, and instantiate a reader context factory in SparkUpsertCommitActionExecutor, in SparkUpsertCommitActionExecutor, override the method BaseSparkCommitActionExecutor#getUpdateHandle to set up the reader context explicitly, may also need to add a set method in FileGroupReaderBasedMergeHandle, only this handle needs this currently. Just like these 2 HoodieMergeHandle.setPartitionFields and HoodieMergeHandle.setPartitionValues which are only specific for Spark COW table.

danny0405 · 2025-08-14T01:29:27Z

hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java

      }
      return id;
-    }).collect(Collectors.toList());
+    }).filter(Objects::nonNull).collect(Collectors.toList());


cannot prune col: %s which does not exist in hudi table

The original logic makes more sense, what is the use case that to prune the schema with a non existing field?

If the internal schema has less fields than the requested schema. In the writer path, the writer schema can have new columns that are not in the in file's existing schema.

It looks like we get around this in the current flows by using AvroSchemaEvolutionUtils.reconcileSchema before setting the InternalSchema. I will test this out with the schema evolution cases.

nsivabalan · 2025-08-14T14:22:03Z

I assume this is the patch being worked upon for COW merge handle migration. can we fix PR title, PR description as well. and revert the draft state.

…ith payload case

cshuo · 2025-08-15T02:14:42Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java

+   * @return an Option containing the writer payload override class name if present, otherwise an empty Option
+   */
+  static Option<String> getWriterPayloadOverride(Properties properties) {
+    if (properties.containsKey("hoodie.datasource.write.payload.class")) {


if (containsKey()..) {Option.of()} else {} -> Option.ofNullable().map() ?

Thanks! I'm realizing I can also just do Option.ofNullable on the result of the get

danny0405

+1, overall looks great.

nsivabalan

very few minor comments/clarification.
lets align on duplicate handling before and after this patch.

nsivabalan · 2025-08-15T14:54:24Z

Pending comments to resolve before I can give it a go.

#13699 (comment)
#13699 (comment)

and getting CI to green.

Also, suggested to track few minor follow ups (UTs ) in a jira.

…GReaderBasedMergeHandle, fix test setup for event time

nsivabalan · 2025-08-15T17:36:20Z

...ent/hudi-client-common/src/test/java/org/apache/hudi/testutils/MetadataMergeWriteStatus.java

-  private static String addStrsAsInt(String a, String b) {
-    return String.valueOf(Integer.parseInt(a) + Integer.parseInt(b));
+  private static String addStrsAsLong(String a, String b) {
+    return String.valueOf(Long.parseLong(a) + Long.parseLong(b));


whats this fix is about?

I triaged this.
looks like the event time metadata tracking could have been broken only if not for this fix. I am not sure if someone was ever using this feature only.

Right, this is a test class so it may cause some test failures

nsivabalan

Appreciate your perseverance navigating through all test failures and getting this to landable shape.

hudi-bot · 2025-08-15T19:50:29Z

CI report:

50022dc UNKNOWN
6cc840c UNKNOWN
7758973 UNKNOWN
b0054c8 UNKNOWN
363e543 UNKNOWN
a893da3 UNKNOWN
539cf8b UNKNOWN
64d3575 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan · 2025-08-15T21:02:35Z

danny0405 · 2025-08-16T01:08:59Z

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java

+            // the operation will be null. Records that are being updated or records being added to the file group for the first time will have an operation set and must generate new metadata.
+            boolean shouldPreserveRecordMetadata = preserveMetadata || record.getOperation() == null;
+            Schema recordSchema = shouldPreserveRecordMetadata ? writeSchemaWithMetaFields : writeSchema;
+            writeToFile(record.getKey(), record, recordSchema, config.getPayloadConfig().getProps(), shouldPreserveRecordMetadata);


Currently we collect SI and RLI no matter wheter the record has been written successfully, this could incur inconsistencies.

Create this JIRA ticket to track and will take it up in the next day or so

…eGroupReader (apache#13699) The goal of this PR is to ensure consistent behavior while reading and writing data across our Merge-on-Read and Copy-on-Write tables by leveraging the existing HoodieFileGroupReader to manage the merging of records. The FileGroupReaderBasedMergeHandle that is currently used for compaction is updated to allow merging with an incoming stream of records. Summary of changes: - FileGroupReaderBasedMergeHandle.java is updated to allow incoming records in the form of an iterator of records directly instead of reading changes exclusively from log files. New callbacks are added to support creating the required outputs for updates to Record Level and Secondary indexes. - The merge handle is also updated to account for preserving the metadata of records that are not updated while also generating the metadata for updated records. This does not impact the compaction workflow which will preserve the metadata of the records. - The FileGroupReaderBasedMergeHandle is set as the default merge handle - New test cases are added for RLI including a test where records move between partitions and deletes are sent to partitions that do not contain the original record - The delete record ordering value is now converted to the engine specific type so there are no issues when performing comparisons Differences between FileGroupReaderBasedMergeHandle and HoodieWriteMergeHandle - Currently the HoodieWriteMergeHandle can handle applying a single update to multiple records with the same key. This functionality does not exist in the FileGroupReaderBasedMergeHandle - The FileGroupReaderBasedMergeHandle does not support the shouldFlush functionality in the HoodieRecordMerger --------- Co-authored-by: Sivabalan Narayanan <[email protected]> Co-authored-by: Lokesh Jain <[email protected]> Co-authored-by: Lokesh Jain <[email protected]> Co-authored-by: danny0405 <[email protected]>

github-actions bot added the size:XL PR with lines of changes > 1000 label Aug 9, 2025

the-other-tim-brown commented Aug 9, 2025

View reviewed changes

danny0405 reviewed Aug 11, 2025

View reviewed changes

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java Show resolved Hide resolved

danny0405 reviewed Aug 11, 2025

View reviewed changes

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java Outdated Show resolved Hide resolved

danny0405 reviewed Aug 11, 2025

View reviewed changes

...ent/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java Show resolved Hide resolved

danny0405 reviewed Aug 11, 2025

View reviewed changes

the-other-tim-brown commented Aug 11, 2025

View reviewed changes

the-other-tim-brown force-pushed the cow-merge-handle-to-fgr-3 branch from 50022dc to 47f8303 Compare August 11, 2025 16:18

the-other-tim-brown commented Aug 11, 2025

View reviewed changes

yihua mentioned this pull request Aug 12, 2025

[HUDI-9705] Fix bugs in spark and avro reader contexts for type promotion and field renaming #13714

Merged

4 tasks

danny0405 reviewed Aug 13, 2025

View reviewed changes

the-other-tim-brown commented Aug 13, 2025

View reviewed changes

the-other-tim-brown force-pushed the cow-merge-handle-to-fgr-3 branch from 0db72ca to a01ce74 Compare August 13, 2025 22:40

danny0405 reviewed Aug 14, 2025

View reviewed changes

the-other-tim-brown changed the title ~~[DRAFT][DNM] CoW write handle with file group reader~~ [DRAFT][DNM] Add implementation of MergeHandle backed by the HoodieFileGroupReader Aug 14, 2025

the-other-tim-brown changed the title ~~[DRAFT][DNM] Add implementation of MergeHandle backed by the HoodieFileGroupReader~~ [HUDI-9622] Add implementation of MergeHandle backed by the HoodieFileGroupReader Aug 14, 2025

the-other-tim-brown marked this pull request as ready for review August 14, 2025 14:46

the-other-tim-brown added 10 commits August 14, 2025 21:31

fix cache bug

4503d4e

fix query schema and undo change to InternalSchemaUtils

f6572e3

fix close sequence on flink if already closed

b150753

remvoe dupe key handling

ff39bef

remove shouldFlush handling and test cases

00569c2

use legacy writer class for duplicate data cases

4f3f3af

style

af3a4b6

fix schema used in gcs test

d8c20e5

add requsted comments, add unit tests for update processor, fix NPE w…

a234757

…ith payload case

update comment on shouldPreserveRecordMetadata case

539cf8b

the-other-tim-brown force-pushed the cow-merge-handle-to-fgr-3 branch from 500b4f5 to 539cf8b Compare August 15, 2025 01:36

mark test as ignored

85139a1

cshuo reviewed Aug 15, 2025

View reviewed changes

danny0405 approved these changes Aug 15, 2025

View reviewed changes

Adding tests for event time metadata

cc430c9

nsivabalan reviewed Aug 15, 2025

View reviewed changes

the-other-tim-brown added 2 commits August 15, 2025 12:05

Address minor comments, fix schema for merger in index utils

04242c5

make operation -> compactionOperation to avoid overlap in naming in F…

2a9cad3

…GReaderBasedMergeHandle, fix test setup for event time

nsivabalan reviewed Aug 15, 2025

View reviewed changes

nsivabalan approved these changes Aug 15, 2025

View reviewed changes

use write schema without meta fields for expression payload merge

64d3575

nsivabalan approved these changes Aug 15, 2025

View reviewed changes

nsivabalan merged commit dedd4e0 into apache:master Aug 15, 2025
60 of 61 checks passed

danny0405 reviewed Aug 16, 2025

View reviewed changes

yihua mentioned this pull request Nov 29, 2025

[HUDI-9591] FG reader based merge handle for COW merge #13580

Closed

4 tasks

hudi-bot mentioned this pull request Nov 30, 2025

Bug: Failed writes still update index metadata #17444

Closed

[HUDI-9622] Add implementation of MergeHandle backed by the HoodieFileGroupReader #13699

[HUDI-9622] Add implementation of MergeHandle backed by the HoodieFileGroupReader #13699

Uh oh!

Conversation

the-other-tim-brown commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

the-other-tim-brown commented Aug 9, 2025 •

edited

Loading

danny0405 Aug 11, 2025 •

edited

Loading

danny0405 Aug 11, 2025 •

edited

Loading