[HUDI-9316] Add support for creating iterator of HoodieRecord from FGReader#13314
Conversation
| HoodieKey hoodieKey = new HoodieKey(bufferedRecord.getRecordKey(), partitionPath); | ||
| if (bufferedRecord.isDelete()) { | ||
| return new HoodieEmptyRecord<>( | ||
| new HoodieKey(bufferedRecord.getRecordKey(), null), |
There was a problem hiding this comment.
Previously we did not have a need for the hoodie record outside of the merger fallback code so not a bug but now we have another use case with the new iterator so this partition field should be set.
| readerContext.constructHoodieRecord(olderRecord), readerContext.getSchemaFromBufferRecord(olderRecord), | ||
| readerContext.constructHoodieRecord(newerRecord), readerContext.getSchemaFromBufferRecord(newerRecord), | ||
| readerContext.constructHoodieRecord(olderRecord, partitionPath), readerContext.getSchemaFromBufferRecord(olderRecord), | ||
| readerContext.constructHoodieRecord(newerRecord, partitionPath), readerContext.getSchemaFromBufferRecord(newerRecord), |
There was a problem hiding this comment.
is it possible we put the partitionPath into the readerContext so there is no need to change these apis.
There was a problem hiding this comment.
Yes, I was wondering something similar. If we want to say that the readercontext is built per file group we can do this but if we want to share the reader context between file groups we'll need to keep it in its current form. I don't have a strong opinion either way so open to ideas here about how the class should be used.
There was a problem hiding this comment.
Actually, looking into the current class I see that there is hasLogFiles which implies the class is currently going to only work per file slice so I will move the partition path to keep the api the same
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
Outdated
Show resolved
Hide resolved
...ient/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java
Outdated
Show resolved
Hide resolved
...ient/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java
Show resolved
Hide resolved
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupRecordBuffer.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
Outdated
Show resolved
Hide resolved
|
@hudi-bot run azure |
| assertEquals(expectedRecords.size(), actualRecordList.size()); | ||
| assertEquals(new HashSet<>(expectedRecords), new HashSet<>(actualRecordList)); | ||
| // validate records can be read from file group as HoodieRecords | ||
| actualRecordList = convertHoodieRecords( |
There was a problem hiding this comment.
should we decouple the test then, the HoodieRecord iterator is a legacy API and should be deprecated already.
There was a problem hiding this comment.
It is not a legacy API yet unfortunately. Looking through all the code to migrate, it seems like we have a while before we can retire the HoodieRecord
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
Show resolved
Hide resolved
12ebf1b to
3652291
Compare
Change Logs
Impact
Risk level (write none, low medium or high below)
Low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist