[HUDI-9316] Add support for creating iterator of HoodieRecord from FGReader by the-other-tim-brown · Pull Request #13314 · apache/hudi

the-other-tim-brown · 2025-05-17T03:44:03Z

Change Logs

Adds support for generating an iterator of HoodieRecord instead of simply the engine record in the FileGroupReader code. This will aide in migrating existing uses of LogScanner and other reader implementations that expect a HoodieRecord as the output.

Impact

Aides in the migration to FileGroupReader in other code paths

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2025-05-17T05:30:29Z

...ient/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java

+    HoodieKey hoodieKey = new HoodieKey(bufferedRecord.getRecordKey(), partitionPath);
    if (bufferedRecord.isDelete()) {
      return new HoodieEmptyRecord<>(
-          new HoodieKey(bufferedRecord.getRecordKey(), null),


is this a bug fix?

Previously we did not have a need for the hoodie record outside of the merger fallback code so not a bug but now we have another use case with the new iterator so this partition field should be set.

danny0405 · 2025-05-17T05:34:41Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/FileGroupRecordBuffer.java

-          readerContext.constructHoodieRecord(olderRecord), readerContext.getSchemaFromBufferRecord(olderRecord),
-          readerContext.constructHoodieRecord(newerRecord), readerContext.getSchemaFromBufferRecord(newerRecord),
+          readerContext.constructHoodieRecord(olderRecord, partitionPath), readerContext.getSchemaFromBufferRecord(olderRecord),
+          readerContext.constructHoodieRecord(newerRecord, partitionPath), readerContext.getSchemaFromBufferRecord(newerRecord),


is it possible we put the partitionPath into the readerContext so there is no need to change these apis.

Yes, I was wondering something similar. If we want to say that the readercontext is built per file group we can do this but if we want to share the reader context between file groups we'll need to keep it in its current form. I don't have a strong opinion either way so open to ideas here about how the class should be used.

Actually, looking into the current class I see that there is hasLogFiles which implies the class is currently going to only work per file slice so I will move the partition path to keep the api the same

hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java

...ient/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java

hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java

hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupRecordBuffer.java

hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java

the-other-tim-brown · 2025-05-18T21:40:49Z

@hudi-bot run azure

danny0405 · 2025-05-19T00:09:44Z

hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java

    assertEquals(expectedRecords.size(), actualRecordList.size());
    assertEquals(new HashSet<>(expectedRecords), new HashSet<>(actualRecordList));
+    // validate records can be read from file group as HoodieRecords
+    actualRecordList = convertHoodieRecords(


should we decouple the test then, the HoodieRecord iterator is a legacy API and should be deprecated already.

It is not a legacy API yet unfortunately. Looking through all the code to migrate, it seems like we have a while before we can retire the HoodieRecord

hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java

hudi-bot · 2025-05-20T16:04:50Z

CI report:

3652291 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

…Reader (apache#13314)

github-actions bot added the size:M PR with lines of changes in (100, 300] label May 17, 2025

danny0405 reviewed May 17, 2025

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java Outdated Show resolved Hide resolved

danny0405 reviewed May 18, 2025

View reviewed changes

...ient/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java Outdated Show resolved Hide resolved

nsivabalan reviewed May 18, 2025

View reviewed changes

danny0405 reviewed May 19, 2025

View reviewed changes

nsivabalan reviewed May 19, 2025

View reviewed changes

hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java Show resolved Hide resolved

nsivabalan approved these changes May 19, 2025

View reviewed changes

Timothy Brown added 7 commits May 20, 2025 09:34

Add support for creating iterator of HoodieRecord from FGReader

b5f22fc

cleanup iterator

cc6282c

move partition path to reader context

02b2107

add test for new iterator

6116143

style

a3a6197

set copy to false for spark record

f396cf7

fix conflicts, reduce code duplication in test

3652291

the-other-tim-brown force-pushed the HUDI-9316-fgreader branch from 12ebf1b to 3652291 Compare May 20, 2025 14:38

nsivabalan merged commit 39d6645 into apache:master May 20, 2025
58 checks passed

the-other-tim-brown deleted the HUDI-9316-fgreader branch May 20, 2025 16:32

danny0405 mentioned this pull request May 21, 2025

[HUDI-9235] Read MDT through FG reader #13300

Merged

4 tasks

alexr17 pushed a commit to alexr17/hudi that referenced this pull request Aug 25, 2025

[HUDI-9316] Add support for creating iterator of HoodieRecord from FG…

a2225eb

…Reader (apache#13314)

Conversation

the-other-tim-brown commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

danny0405 May 17, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown May 17, 2025

Choose a reason for hiding this comment

Uh oh!

danny0405 May 17, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown May 17, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

the-other-tim-brown commented May 18, 2025

Uh oh!

danny0405 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hudi-bot commented May 20, 2025

CI report:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

the-other-tim-brown commented May 17, 2025 •

edited

Loading