Commit dedd4e0
The goal of this PR is to ensure consistent behavior while reading and writing data across our Merge-on-Read and Copy-on-Write tables by leveraging the existing HoodieFileGroupReader to manage the merging of records. The FileGroupReaderBasedMergeHandle that is currently used for compaction is updated to allow merging with an incoming stream of records.
Summary of changes:
- FileGroupReaderBasedMergeHandle.java is updated to allow incoming records in the form of an iterator of records directly instead of reading changes exclusively from log files. New callbacks are added to support creating the required outputs for updates to Record Level and Secondary indexes.
- The merge handle is also updated to account for preserving the metadata of records that are not updated while also generating the metadata for updated records. This does not impact the compaction workflow which will preserve the metadata of the records.
- The FileGroupReaderBasedMergeHandle is set as the default merge handle
- New test cases are added for RLI including a test where records move between partitions and deletes are sent to partitions that do not contain the original record
- The delete record ordering value is now converted to the engine specific type so there are no issues when performing comparisons
Differences between FileGroupReaderBasedMergeHandle and HoodieWriteMergeHandle
- Currently the HoodieWriteMergeHandle can handle applying a single update to multiple records with the same key. This functionality does not exist in the FileGroupReaderBasedMergeHandle
- The FileGroupReaderBasedMergeHandle does not support the shouldFlush functionality in the HoodieRecordMerger
---------
Co-authored-by: Sivabalan Narayanan <[email protected]>
Co-authored-by: Lokesh Jain <[email protected]>
Co-authored-by: Lokesh Jain <[email protected]>
Co-authored-by: danny0405 <[email protected]>
1 parent fc41c22 commit dedd4e0
File tree
70 files changed
+1770
-386
lines changed- hudi-client
- hudi-client-common/src
- main/java/org/apache/hudi
- client
- config
- index
- io
- table
- action/commit
- test/java/org/apache/hudi
- io
- testutils
- utils
- hudi-flink-client/src/main/java/org/apache/hudi
- client/common
- io
- hudi-java-client/src
- main/java/org/apache/hudi/client/common
- test/java/org/apache/hudi/table/action/commit
- hudi-spark-client/src
- main/java/org/apache/hudi
- client/common
- table/action/commit
- test/java/org/apache/hudi
- io
- table/action/commit
- testutils
- hudi-common/src
- main/java/org/apache/hudi
- common
- engine
- model
- table/read
- buffer
- util
- metadata
- test/java/org/apache/hudi/common
- serialization
- table/read
- buffer
- testutils
- hudi-hadoop-common/src/test/java/org/apache/hudi/common/table/read
- hudi-spark-datasource
- hudi-spark-common/src/main/scala/org/apache
- hudi/cdc
- spark/sql/execution/datasources/parquet
- hudi-spark/src/test
- java/org/apache/hudi
- io
- table
- testutils
- scala/org/apache
- hudi
- functional
- cdc
- spark/sql/hudi/dml/others
- hudi-utilities/src/test/java/org/apache/hudi/utilities
- sources
- testutils/sources
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
70 files changed
+1770
-386
lines changedLines changed: 12 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
112 | 123 | | |
113 | 124 | | |
114 | 125 | | |
| |||
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
77 | 76 | | |
78 | 77 | | |
79 | 78 | | |
| |||
858 | 857 | | |
859 | 858 | | |
860 | 859 | | |
861 | | - | |
| 860 | + | |
862 | 861 | | |
863 | 862 | | |
864 | 863 | | |
| |||
Lines changed: 5 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
433 | | - | |
434 | | - | |
| 433 | + | |
| 434 | + | |
435 | 435 | | |
436 | 436 | | |
437 | 437 | | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | 438 | | |
444 | 439 | | |
445 | 440 | | |
| |||
526 | 521 | | |
527 | 522 | | |
528 | 523 | | |
529 | | - | |
| 524 | + | |
530 | 525 | | |
531 | 526 | | |
532 | 527 | | |
533 | 528 | | |
534 | 529 | | |
535 | 530 | | |
536 | | - | |
| 531 | + | |
537 | 532 | | |
538 | 533 | | |
539 | 534 | | |
| |||
551 | 546 | | |
552 | 547 | | |
553 | 548 | | |
554 | | - | |
| 549 | + | |
555 | 550 | | |
556 | 551 | | |
557 | 552 | | |
| |||
0 commit comments