[HUDI-9424] Extract and test merger logic for FileGroupReader by the-other-tim-brown · Pull Request #13242 · apache/hudi

the-other-tim-brown · 2025-04-30T21:06:35Z

Change Logs

Adds a class EngineBasedMerger that handles the merging logic that was duplicated in the FileGroupReader code. This class serves as an optimal way to perform the commit and event time ordering based merge without constructing HoodieRecords which reduces overhead.
Standardizes how deletes are translated into HoodieRecords
Reduces duplicate code in FileGroupRecordBuffer implementations to ensure consistent behavior

Impact

Makes the merging logic easier to test and consistent when merging two log records vs a log and base file record
Reduces maintenance overhead by deduping the code

Risk level (write none, low medium or high below)

Low, increases coverage and fixes some minor issues in the merge logic

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

jonvex

Seems pretty good, but need to look at this again at least once more

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java

jonvex · 2025-05-01T18:50:32Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/BufferedRecord.java

 public class BufferedRecord<T> implements Serializable {
+  // the key of the record
  private final String recordKey;
+  // the ordering value of the record to be used for even time based ordering


hudi-common/src/main/java/org/apache/hudi/common/table/read/EngineBasedMerger.java

jonvex · 2025-05-01T19:01:28Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java

    } else if (isSkipMerge) {
      return new UnmergedFileGroupRecordBuffer<>(
-          readerContext, hoodieTableMetaClient, recordMergeMode, Option.empty(), Option.empty(), props, readStats);
+          readerContext, hoodieTableMetaClient, recordMergeMode, Option.empty(), Option.empty(), props, readStats, merger);


do we still need to be passing in the merge mode here?

The merge mode is used internally for determining whether the ordering field should be set.

+1 to @jonvex , why can't we just initialze the merger inside each specific record buffer?

jonvex · 2025-05-05T22:54:15Z

hudi-common/src/test/java/org/apache/hudi/common/table/read/TestEngineBasedMerger.java

+
+  private static Stream<Arguments> commitTimeOrdering() {
+    return Stream.of(
+        // Validate commit time does not impact the ordering


I think this is correct?:

// Validate event time does not impact the ordering

jonvex

lgtm

danny0405 · 2025-05-06T00:47:41Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/EngineBasedMerger.java

+   * @return the latest record as a delete record
+   */
+  private BufferedRecord<T> getLatestAsDeleteRecord(BufferedRecord<T> newer, BufferedRecord<T> older) {
+    if (recordMerger.map(merger -> merger.getMergingStrategy().equals(HoodieRecordMerger.COMMIT_TIME_BASED_MERGE_STRATEGY_UUID)).orElse(false)) {


So the record merge mode does not work perfectly here?

This is to support the retraction message you had brought up. It avoids setting the value to null so you can send a retraction for the content. I never got a concrete example from you so I just went with my best understanding of the topic.

It's tricky you emplify the msg payload first for delete before merger merging then try to recover it here, let's remove the special handling in constructHoodieRecord for deletes, just construct the record like others and merger would return the correct record based on the orderingVal, and the BufferedRecord can be recoverd correctly with right HoodieOperation and payload data(the row).

The issue here is that the mergers will return an empty option when there is a delete: https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java#L50

I think that this can cause some unexpected issues when using event time ordering since you will lose the ordering value that should be used when comparing to the next record.

For example, consider a delete at T2 and then it is followed by an insert with T1, that insert should be ignored but how will we keep track of that when the output drops this context?

I've updated the code a bit and if either of the records are a delete we handle it first. This case where the merger outputs an empty option should not happen anymore but I am not sure of the safest way to handle this since a developer can provide any implementation of this merger logic.

danny0405 · 2025-05-06T00:48:42Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/EngineBasedMerger.java

+ * The class takes in {@link HoodieReaderContext<T>} for the engine specific operations such as fetching the value representing the event time when {@link RecordMergeMode#EVENT_TIME_ORDERING} is used.
+ * @param <T> The type of the engine's row
+ */
+public class EngineBasedMerger<T> {


We already have RecordMerger abstraction, maybe we rename it as MergeEngine or something?

I'm open to changing the name. The idea around EngineBased was that we did not need to convert into HoodieRecords event and commit ordering paths. The hope here is that we can keep optimizing along this path to have a single implementation for all merging that relies on some basic functionality provided by each engine for selecting ordering fields or handling partial merging

danny0405 · 2025-05-06T02:17:49Z

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java

   * @return A new instance of {@link HoodieRecord}.
   */
-  public abstract HoodieRecord<T> constructHoodieRecord(BufferedRecord<T> bufferedRecord);
+  protected abstract HoodieRecord<T> constructHoodieDataRecord(BufferedRecord<T> bufferedRecord);


I don't think this makes sense, we still need to keep the delete payloads for streaming retraction scenarios.

Can you link to these scenarios so I can get a better understanding of how they are used?

For example, the schema is (id, name, val) and the id is the record key, there is an insert then update to the value of it, you have msg events like below and the operator sends one msg at a time to the downstream:

[+I] [1, "a", 1] [-U] [1, "a", 1] [+U] [1, "a", 3]

The -U msg is a retraction msg to downstream, when the downstream operators received the msg, it would minus the current value 1 with 1 so it becomes 0.

The point here is to keep the data payload of the delete msg so the downstream can figure out the value to subtract.

Ok, seems like this is related to the question here

danny0405 · 2025-05-06T02:19:39Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/KeyBasedFileGroupRecordBuffer.java

-
-  @Override
-  public void processNextDeletedRecord(DeleteRecord deleteRecord, Serializable recordKey) {
-    BufferedRecord<T> existingRecord = records.get(recordKey);


I don't think it makes sense to merge the handling of processNextDataRecord and processNextDeletedRecord

There can be deletions inside of data records as well so we need to make sure these are handled in a uniform way

There can be deletions inside of data records as well

it's true but the paload data is there while the DeleteRecord is not, I know unifying of code is good but it woud mess up the BufferedRecord -> HoodieRecord conversion because the later can only be constructed as empty hoodie record.

I don't understand why this would mess up the conversion. How is it different than what is in place today where we handle comparing BufferedRecords that are deletes to BufferedRecords that are not deletes?

yihua

Blocked on my review

danny0405 · 2025-05-08T01:41:01Z

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java

+   */
+  public HoodieRecord<T> constructHoodieRecord(BufferedRecord<T> bufferedRecord) {
+    if (bufferedRecord.isDelete()) {
+      return new HoodieEmptyRecord<>(


It's dangerous to do this because many mergers just returns empty if the payload data is null, then the event time merging semantics would be lost and we also lost the payload data that been stored in the BufferedRecord.

This is how the code is currently written, I am just doing some refactoring.
See these references: Spark and Flink.

Are you saying that this is already wrong and needs to be fixed? If so, is the solution check if the data is present instead of simply whether it is a delete so we carry through as much context as possible?

I've updated this to check for null instead of isDelete

danny0405

block on my final review

…sistent way

…istent older/newer ordering

danny0405 · 2025-05-24T02:02:44Z

I see some negative changs that I don't really like:

unnecessary overhead:

BufferedRecord#forRecordWithContext -> readerContext.convertValueToEngineType(orderingValue)

unnecessary complexity exposed:

KeyBasedFileGroupRecordBuffer#processNextDataRecord -> !existing.equals(merged), enablePartialMerging
FileGroupRecordBuffer#hasNextBaseRecord -> readerContext.projectRecord(

interface that does not make sense:

BufferedRecord#asDeleteRecord
EngineBasedMerger#getLatestAsDeleteRecord

Maybe we just put the merging related logic together and does not touch the logic/interface change as of now, so you can test each of them separately.

FileGroupRecordBuffer#merge;
FileGroupRecordBUffer#doProcessNextDataRecord;
FileGroupRecordBUffer#doProcessNextDeletedRecord.

… merger, move schema handling to payload case only for efficiency

the-other-tim-brown · 2025-05-24T03:48:22Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/FileGroupRecordBuffer.java

-      return readerSchema;
-    }
-    return readerContext.getSchemaFromBufferRecord(bufferedRecord);
+  private BufferedRecord<T> merge(BufferedRecord<T> baseRecord, BufferedRecord<T> logRecord) throws IOException {


This signature changes since creating a Pair per record is unnecessary overhead. We already have the BufferedRecord which will give us the context of whether or not it is a delete. The pair also uses a Boolean so you have the autoboxing overhead

hudi-bot · 2025-05-24T05:06:08Z

CI report:

d417865 UNKNOWN
33fa072 UNKNOWN
fae6454 UNKNOWN
62720ba UNKNOWN
99006dc UNKNOWN
8afa37c Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405 · 2025-05-24T09:06:25Z

I tried to push some high-level changes to the code:

move the enablePartialMerging inside the EngineRecordMerger and add an enable method for it so that there is no need to pass it every time for #merge ;
add merge(Option, T) for log/log merging and merge(T, Option) for base/log merging, the assumption for null is different for these 2 scenarios;

Then I try to inspect the logic changes and found several issues that made me quit:

for log/log merging with CUSTOM merge mode, before the change, there is no need to do combined record -> buffer record conversion if the new has lower ordering value;
for log/log merging with CUSTOM merge mode, before the change, the log records would be evolved uniformly in the FileGroupRecordBuffer#getRecordsIterator , now they evolved in the merging phase, which incur unnecessary performance regression;
the getNewerRecordWithEventTimeOrdering semantics has been changed to be wrong, because a newer record with bigger ordering value would be ignored;
In BufferedRecord#forDeleteRecord , the ordering value is converted to engine type which is wrong because we should use Java type to keep engine agnostic;

It looks like you do not have good knowledge of these nuances and my suggestion is we do not change the logic and api first and just move them together for easy testing.

github-actions bot added the size:XL PR with lines of changes > 1000 label Apr 30, 2025

jonvex reviewed May 1, 2025

View reviewed changes

jonvex reviewed May 5, 2025

View reviewed changes

jonvex approved these changes May 5, 2025

View reviewed changes

danny0405 reviewed May 6, 2025

View reviewed changes

yihua requested changes May 6, 2025

View reviewed changes

danny0405 reviewed May 8, 2025

View reviewed changes

danny0405 requested changes May 8, 2025

View reviewed changes

the-other-tim-brown mentioned this pull request May 19, 2025

[HUDI-9316] Add support for creating iterator of HoodieRecord from FGReader #13314

Merged

4 tasks

the-other-tim-brown force-pushed the merger-and-refactor branch from 3a8a4f0 to 63e3a1f Compare May 19, 2025 18:10

the-other-tim-brown changed the title ~~[DRAFT] Extract and test merger logic for FileGroupReader~~ [HUDI-9424] Extract and test merger logic for FileGroupReader May 20, 2025

the-other-tim-brown marked this pull request as ready for review May 20, 2025 01:52

the-other-tim-brown force-pushed the merger-and-refactor branch from 29a04ce to 84620fa Compare May 23, 2025 03:33

Timothy Brown added 14 commits May 23, 2025 09:48

initial draft of refactoring, needs unit test

818a956

dedupe ordering field code

3353eff

start adding tests

f84f4f9

fix rebase issues

42c4f64

reduce size of changes, fix schema evolution, handle deletes in a con…

9462a2a

…sistent way

add more testing, standardize delete records

2da7a16

add more merge tests, fix bugs

f09b1c2

style

d91d031

fix update optimization

597677c

add delete testing for custom payload

e5d4dcd

add some java docs, ensure consistent ordering value type

bd93749

fix use of final

f5d056b

move conversion to record buffer

6b1abd4

update javadocs

a350afe

Timothy Brown added 12 commits May 23, 2025 09:52

update comment

1fdba01

fix conflicts

336cbbd

add back delete record vs data record processing

5ac01b9

move delete handling before partial merge, update signatures for cons…

cf7520d

…istent older/newer ordering

add delete handling logic to merger and unit tests

a5dd982

simplify null handling, update unit test

f7743d3

fix style, use constant

5dc6d98

rename method

9886515

cleanup after rebase

5a3d2ad

undo empty record handling

a2d6140

fix empty

3bb0c71

fix conflicts

2616258

the-other-tim-brown force-pushed the merger-and-refactor branch from bb97bf3 to 2616258 Compare May 23, 2025 15:18

Timothy Brown added 4 commits May 23, 2025 22:29

restore doProcess methods for capturing logic for handling outputs of…

33fa072

… merger, move schema handling to payload case only for efficiency

add back inner method to keep code more similar to master

fae6454

further minimize diff with master

62720ba

update javadoc

99006dc

the-other-tim-brown commented May 24, 2025

View reviewed changes

add proper message for illegal state exception

8afa37c

the-other-tim-brown closed this May 26, 2025

Conversation

the-other-tim-brown commented Apr 30, 2025

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

jonvex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonvex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 left a comment

Choose a reason for hiding this comment

Uh oh!

danny0405 commented May 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented May 24, 2025

CI report:

Uh oh!

danny0405 commented May 24, 2025

Uh oh!

Reviewers

Assignees