Skip to content

Commit a53bc7b

Browse files
committed
[HUDI-4292] Update the RFC-46 doc because the Record Merge API is changed from CombineEngine to HoodieMerge
1 parent bf4ef73 commit a53bc7b

File tree

1 file changed

+40
-20
lines changed

1 file changed

+40
-20
lines changed

rfc/rfc-46/rfc-46.md

Lines changed: 40 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -74,49 +74,69 @@ Following (high-level) steps are proposed:
7474
2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload)
7575
3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
7676
4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
77-
2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
77+
2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
7878
1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
7979
2. Plug-in point for user-defined combination semantics
8080
3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
8181

8282
Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
8383
is known to have poor performance (compared to non-reflection based instantiation).
8484

85-
#### Combine API Engine
85+
#### Record Merge API
8686

8787
Stateless component interface providing for API Combining Records will look like following:
8888

8989
```java
90-
interface HoodieRecordCombiningEngine {
91-
92-
default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
93-
if (spark) {
94-
precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
95-
} else if (flink) {
96-
// precombine for Flink
97-
}
98-
}
90+
interface HoodieMerge {
91+
HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer);
92+
93+
Option<HoodieRecord> combineAndGetUpdateValue(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
94+
}
9995

10096
/**
10197
* Spark-specific implementation
10298
*/
103-
SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
104-
105-
// ...
106-
}
99+
class HoodieSparkRecordMerge implements HoodieMerge {
100+
101+
@Override
102+
public HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer) {
103+
// HoodieSparkRecords preCombine
104+
}
105+
106+
@Override
107+
public Option<HoodieRecord> combineAndGetUpdateValue(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) {
108+
// HoodieSparkRecord combineAndGetUpdateValue
109+
}
110+
}
111+
112+
/**
113+
* Flink-specific implementation
114+
*/
115+
class HoodieFlinkRecordMerge implements HoodieMerge {
116+
117+
@Override
118+
public HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer) {
119+
// HoodieFlinkRecord preCombine
120+
}
121+
122+
@Override
123+
public Option<HoodieRecord> combineAndGetUpdateValue(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) {
124+
// HoodieFlinkRecord combineAndGetUpdateValue
125+
}
126+
}
107127
```
108128
Where user can provide their own subclass implementing such interface for the engines of interest.
109129

110-
#### Migration from `HoodieRecordPayload` to `HoodieRecordCombiningEngine`
130+
#### Migration from `HoodieRecordPayload` to `HoodieMerge`
111131

112132
To warrant backward-compatibility (BWC) on the code-level with already created subclasses of `HoodieRecordPayload` currently
113-
already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieRecordCombiningEngine`, that will
133+
already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieMerge`, that will
114134
be using user-defined subclass of `HoodieRecordPayload` to combine the records.
115135

116136
Leveraging such bridge will make provide for seamless BWC migration to the 0.11 release, however will be removing the performance
117137
benefit of this refactoring, since it would unavoidably have to perform conversion to intermediate representation (Avro). To realize
118138
full-suite of benefits of this refactoring, users will have to migrate their merging logic out of `HoodieRecordPayload` subclass and into
119-
new `HoodieRecordCombiningEngine` implementation.
139+
new `HoodieMerge` implementation.
120140

121141
### Refactoring Flows Directly Interacting w/ Records:
122142

@@ -128,7 +148,7 @@ Following major components will be refactored:
128148

129149
1. `HoodieWriteHandle`s will be
130150
1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion)
131-
2. Using Combining API engine to merge records (when necessary)
151+
2. Using Record Merge API to merge records (when necessary)
132152
3. Passes `HoodieRecord` as is to `FileWriter`
133153
2. `HoodieFileWriter`s will be
134154
1. Accepting `HoodieRecord`
@@ -142,7 +162,7 @@ Following major components will be refactored:
142162
- What impact (if any) will there be on existing users?
143163
- Users of the Hudi will observe considerably better performance for most of the routine operations: writing, reading, compaction, clustering, etc due to avoiding the superfluous intermediate de-/serialization penalty
144164
- By default, modified hierarchy would still leverage
145-
- Users will need to rebase their logic of combining records by creating a subclass of `HoodieRecordPayload`, and instead subclass newly created interface `HoodieRecordCombiningEngine` to get full-suite of performance benefits
165+
- Users will need to rebase their logic of combining records by creating a subclass of `HoodieRecordPayload`, and instead subclass newly created interface `HoodieMerge` to get full-suite of performance benefits
146166
- If we are changing behavior how will we phase out the older behavior?
147167
- Older behavior leveraging `HoodieRecordPayload` for merging will be marked as deprecated in 0.11, and subsequently removed in 0.1x
148168
- If we need special migration tools, describe them here.

0 commit comments

Comments
 (0)