[HUDI-9424] Extract and test merger logic for FileGroupReader#13242
[HUDI-9424] Extract and test merger logic for FileGroupReader#13242the-other-tim-brown wants to merge 31 commits intoapache:masterfrom
Conversation
jonvex
left a comment
There was a problem hiding this comment.
Seems pretty good, but need to look at this again at least once more
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java
Outdated
Show resolved
Hide resolved
| public class BufferedRecord<T> implements Serializable { | ||
| // the key of the record | ||
| private final String recordKey; | ||
| // the ordering value of the record to be used for even time based ordering |
hudi-common/src/main/java/org/apache/hudi/common/table/read/EngineBasedMerger.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/read/EngineBasedMerger.java
Show resolved
Hide resolved
| } else if (isSkipMerge) { | ||
| return new UnmergedFileGroupRecordBuffer<>( | ||
| readerContext, hoodieTableMetaClient, recordMergeMode, Option.empty(), Option.empty(), props, readStats); | ||
| readerContext, hoodieTableMetaClient, recordMergeMode, Option.empty(), Option.empty(), props, readStats, merger); |
There was a problem hiding this comment.
do we still need to be passing in the merge mode here?
There was a problem hiding this comment.
The merge mode is used internally for determining whether the ordering field should be set.
There was a problem hiding this comment.
+1 to @jonvex , why can't we just initialze the merger inside each specific record buffer?
|
|
||
| private static Stream<Arguments> commitTimeOrdering() { | ||
| return Stream.of( | ||
| // Validate commit time does not impact the ordering |
There was a problem hiding this comment.
I think this is correct?:
// Validate event time does not impact the ordering
| * @return the latest record as a delete record | ||
| */ | ||
| private BufferedRecord<T> getLatestAsDeleteRecord(BufferedRecord<T> newer, BufferedRecord<T> older) { | ||
| if (recordMerger.map(merger -> merger.getMergingStrategy().equals(HoodieRecordMerger.COMMIT_TIME_BASED_MERGE_STRATEGY_UUID)).orElse(false)) { |
There was a problem hiding this comment.
So the record merge mode does not work perfectly here?
There was a problem hiding this comment.
This is to support the retraction message you had brought up. It avoids setting the value to null so you can send a retraction for the content. I never got a concrete example from you so I just went with my best understanding of the topic.
There was a problem hiding this comment.
It's tricky you emplify the msg payload first for delete before merger merging then try to recover it here, let's remove the special handling in constructHoodieRecord for deletes, just construct the record like others and merger would return the correct record based on the orderingVal, and the BufferedRecord can be recoverd correctly with right HoodieOperation and payload data(the row).
There was a problem hiding this comment.
The issue here is that the mergers will return an empty option when there is a delete: https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java#L50
I think that this can cause some unexpected issues when using event time ordering since you will lose the ordering value that should be used when comparing to the next record.
For example, consider a delete at T2 and then it is followed by an insert with T1, that insert should be ignored but how will we keep track of that when the output drops this context?
There was a problem hiding this comment.
I've updated the code a bit and if either of the records are a delete we handle it first. This case where the merger outputs an empty option should not happen anymore but I am not sure of the safest way to handle this since a developer can provide any implementation of this merger logic.
| * The class takes in {@link HoodieReaderContext<T>} for the engine specific operations such as fetching the value representing the event time when {@link RecordMergeMode#EVENT_TIME_ORDERING} is used. | ||
| * @param <T> The type of the engine's row | ||
| */ | ||
| public class EngineBasedMerger<T> { |
There was a problem hiding this comment.
We already have RecordMerger abstraction, maybe we rename it as MergeEngine or something?
There was a problem hiding this comment.
I'm open to changing the name. The idea around EngineBased was that we did not need to convert into HoodieRecords event and commit ordering paths. The hope here is that we can keep optimizing along this path to have a single implementation for all merging that relies on some basic functionality provided by each engine for selecting ordering fields or handling partial merging
| * @return A new instance of {@link HoodieRecord}. | ||
| */ | ||
| public abstract HoodieRecord<T> constructHoodieRecord(BufferedRecord<T> bufferedRecord); | ||
| protected abstract HoodieRecord<T> constructHoodieDataRecord(BufferedRecord<T> bufferedRecord); |
There was a problem hiding this comment.
I don't think this makes sense, we still need to keep the delete payloads for streaming retraction scenarios.
There was a problem hiding this comment.
Can you link to these scenarios so I can get a better understanding of how they are used?
There was a problem hiding this comment.
For example, the schema is (id, name, val) and the id is the record key, there is an insert then update to the value of it, you have msg events like below and the operator sends one msg at a time to the downstream:
[+I] [1, "a", 1]
[-U] [1, "a", 1]
[+U] [1, "a", 3]The -U msg is a retraction msg to downstream, when the downstream operators received the msg, it would minus the current value 1 with 1 so it becomes 0.
The point here is to keep the data payload of the delete msg so the downstream can figure out the value to subtract.
There was a problem hiding this comment.
Ok, seems like this is related to the question here
|
|
||
| @Override | ||
| public void processNextDeletedRecord(DeleteRecord deleteRecord, Serializable recordKey) { | ||
| BufferedRecord<T> existingRecord = records.get(recordKey); |
There was a problem hiding this comment.
I don't think it makes sense to merge the handling of processNextDataRecord and processNextDeletedRecord
There was a problem hiding this comment.
There can be deletions inside of data records as well so we need to make sure these are handled in a uniform way
There was a problem hiding this comment.
There can be deletions inside of data records as well
it's true but the paload data is there while the DeleteRecord is not, I know unifying of code is good but it woud mess up the BufferedRecord -> HoodieRecord conversion because the later can only be constructed as empty hoodie record.
There was a problem hiding this comment.
I don't understand why this would mess up the conversion. How is it different than what is in place today where we handle comparing BufferedRecords that are deletes to BufferedRecords that are not deletes?
| */ | ||
| public HoodieRecord<T> constructHoodieRecord(BufferedRecord<T> bufferedRecord) { | ||
| if (bufferedRecord.isDelete()) { | ||
| return new HoodieEmptyRecord<>( |
There was a problem hiding this comment.
It's dangerous to do this because many mergers just returns empty if the payload data is null, then the event time merging semantics would be lost and we also lost the payload data that been stored in the BufferedRecord.
There was a problem hiding this comment.
This is how the code is currently written, I am just doing some refactoring.
See these references: Spark and Flink.
Are you saying that this is already wrong and needs to be fixed? If so, is the solution check if the data is present instead of simply whether it is a delete so we carry through as much context as possible?
There was a problem hiding this comment.
I've updated this to check for null instead of isDelete
danny0405
left a comment
There was a problem hiding this comment.
block on my final review
3a8a4f0 to
63e3a1f
Compare
29a04ce to
84620fa
Compare
…istent older/newer ordering
bb97bf3 to
2616258
Compare
|
I see some negative changs that I don't really like: unnecessary overhead:
unnecessary complexity exposed:
interface that does not make sense:
Maybe we just put the merging related logic together and does not touch the logic/interface change as of now, so you can test each of them separately.
|
… merger, move schema handling to payload case only for efficiency
| return readerSchema; | ||
| } | ||
| return readerContext.getSchemaFromBufferRecord(bufferedRecord); | ||
| private BufferedRecord<T> merge(BufferedRecord<T> baseRecord, BufferedRecord<T> logRecord) throws IOException { |
There was a problem hiding this comment.
This signature changes since creating a Pair per record is unnecessary overhead. We already have the BufferedRecord which will give us the context of whether or not it is a delete. The pair also uses a Boolean so you have the autoboxing overhead
|
I tried to push some high-level changes to the code:
Then I try to inspect the logic changes and found several issues that made me quit:
It looks like you do not have good knowledge of these nuances and my suggestion is we do not change the logic and api first and just move them together for easy testing. |
Change Logs
EngineBasedMergerthat handles the merging logic that was duplicated in the FileGroupReader code. This class serves as an optimal way to perform the commit and event time ordering based merge without constructing HoodieRecords which reduces overhead.Impact
Risk level (write none, low medium or high below)
Low, increases coverage and fixes some minor issues in the merge logic
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist