[HUDI-9332] Pluggable Table Format Support with native Integration#13216
[HUDI-9332] Pluggable Table Format Support with native Integration#13216danny0405 merged 11 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements pluggable table format support with native integration, enabling the system to use a configurable table format for various timeline operations instead of the fixed timeline layout. Key changes include:
- Replacing direct usage of timeline layout with the pluggable table format interface across multiple modules.
- Introducing new methods and fields (e.g. in HoodieTableMetaClient and HoodieTableConfig) to support native table format integration.
- Updating tests and client code to exercise the new table format behavior.
Reviewed Changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| hudi-common/src/main/java/org/apache/hudi/common/table/timeline/BaseHoodieTimeline.java | Added helper method to retrieve instants via the file system using the metaClient. |
| hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java | Integrated pluggable table format support and introduced a native active timeline accessor. |
| hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java | Added configuration properties and a method to instantiate a pluggable table format. |
| hudi-common/src/main/java/org/apache/hudi/common/PluggableTableFormat.java & HudiPluggableTableFormat.java | Defined and implemented the pluggable table format interface. |
| Various client modules and tests | Replaced timeline layout factory calls with table format factory calls to support the new integration. |
Comments suppressed due to low confidence (2)
hudi-common/src/main/java/org/apache/hudi/common/HoodieTableConfig.java:739
- The ReflectionUtils.loadClass call currently does not capture or use the loaded class instance, as the method always returns a new HudiPluggableTableFormat. Consider instantiating and returning the loaded class instance if a supplementary table format class is provided.
ReflectionUtils.loadClass(getSupplementaryTableFormatClassName(), new Class[] {TimelineLayoutVersion.class}, new Object[] {layoutVersion});
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:121
- [nitpick] Consider simplifying the lambda expression by using flatMap directly on the streams from each action to collect archived instants. This will enhance code readability and reduce unnecessary intermediate collection steps.
List<HoodieInstant> archivedInstants = instantsToArchive.stream()
.map(action -> Stream.concat(action.getCompletedInstants().stream(), action.getPendingInstants().stream()).collect(Collectors.toList()))
.flatMap(Collection::stream).collect(Collectors.toList());
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
Show resolved
Hide resolved
...n/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/PluggableTableFormat.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataFactory.java
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
Outdated
Show resolved
Hide resolved
vinothchandar
left a comment
There was a problem hiding this comment.
Made a pass. Have a directional comment which I feel may be cross cutting on the PR
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/HudiPluggableTableFormat.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/PluggableTableFormat.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/SkewAdjustingTimeGenerator.java
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
Outdated
Show resolved
Hide resolved
|
I don't understand costs in terms of performance, that we should pay for this change. There are a lot of work that was already done, and there is still a lot to do down the road. From my point of view, performance is a critical parameter for users when we are talking about Big Data processing. Is there any benchmark results for this change? |
|
Another question, what will happen, if we wrote Hudi metadata successfully, but failed during writing of Iceberg or Delta metadata? How we will process this case, when Hudi table is fine, but interoperability support is broken at some commit? |
hudi-common/src/main/java/org/apache/hudi/common/HudiPluggableTableFormat.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/PluggableTableFormat.java
Show resolved
Hide resolved
e402ce7 to
9a2d3d6
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
Left some comments (some nts: note to self) and some legit to check.
If you can rebase, address as much of these as possible - then I can take another full pass.
Please feel free to resolve code review comments that are addressed.
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java
Outdated
Show resolved
Hide resolved
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
...n/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Show resolved
Hide resolved
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/FileStatsIndex.java
Outdated
Show resolved
Hide resolved
...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
Show resolved
Hide resolved
a7b6ae7 to
a3477dc
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
Abstractions look good. Left a bunch of comments for me check, shepherd and land. I can make those myself.
| table.getMetaClient().reloadActiveTimeline(); | ||
| } | ||
|
|
||
| // If instant is inflight but marked as completed in native format, delete the completed instant from storage. |
There was a problem hiding this comment.
Is this a bug fix? how could this happen?
There was a problem hiding this comment.
When committing -> we commit to native timeline first (A), then plugged-in tableformat (B)
instantToRollback.isInflight() can be true, if A happened, but we failed before B. This is fixing that.
There was a problem hiding this comment.
I ll make this limited to cases where something else is plugged in
There was a problem hiding this comment.
yeah, let's avoid listing the timeline multiple times for reqular workflow when the external table format is not there.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/dto/TimelineDTO.java
Show resolved
Hide resolved
...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ActiveTimelineV1.java
Show resolved
Hide resolved
...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ActiveTimelineV2.java
Outdated
Show resolved
Hide resolved
...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ActiveTimelineV2.java
Outdated
Show resolved
Hide resolved
| */ | ||
| HoodieInstant transitionClusterInflightToComplete(boolean shouldLock, HoodieInstant inflightInstant, HoodieReplaceCommitMetadata metadata); | ||
|
|
||
| HoodieInstant transitionClusterInflightToComplete(boolean shouldLock, HoodieInstant inflightInstant, HoodieReplaceCommitMetadata metadata, TableFormatCompletionAction tableFormatCompletionAction); |
There was a problem hiding this comment.
we should try and avoid the overload?
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
Outdated
Show resolved
Hide resolved
a3477dc to
62e911a
Compare
| metaClient.getStorage().exists(getInstantFileNamePath(fromInstantFileName)), | ||
| "File " + getInstantFileNamePath(fromInstantFileName) + " does not exist!"); | ||
| createCompleteFileInMetaPath(shouldLock, toInstant, metadata); | ||
| String completionTime = HoodieInstantTimeGenerator.formatDateBasedOnTimeZone(new Date(createCompleteFileInMetaPath(shouldLock, toInstant, metadata))); |
There was a problem hiding this comment.
nts: need to ensure this is consistent with our approach to time generation
There was a problem hiding this comment.
I think this is fine. keeping this open.. and tracking.
Will get @the-other-tim-brown 's eyes on this next week.
There was a problem hiding this comment.
The logic is correct but redundant, I will fix this.
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java
Outdated
Show resolved
Hide resolved
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/NativeTableFormat.java
Outdated
Show resolved
Hide resolved
danny0405
left a comment
There was a problem hiding this comment.
block on my final review.
1b4151b to
c146e82
Compare
c146e82 to
3fc8cbf
Compare
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java
Outdated
Show resolved
Hide resolved
...t-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java
Show resolved
Hide resolved
|
@geserdugarov sorry. missed your comments there.. does the RFC help address some of those |
|
|
||
| private void initMetaClient() { | ||
| if (this.metaClient == null) { | ||
| this.metaClient = StreamerUtil.createMetaClient(conf); |
There was a problem hiding this comment.
@danny0405 general question for you. is StreamerUtil the right place for such reusable helper methods. I would have expected sth like FlinkTableUtils or sth generic.
There was a problem hiding this comment.
yeah, for most of the the utilities method we put it here. we also have other utilities, you can check all the other utility classes under the same package.
Address review comments Avoid local timezone conversion for completion time Handle rollback and savepoint. Add TableFormatCompletionAction functional interface. Fix bug in saveAsComplete. Use Consumer instead Add functional tests for TableFormat Fix check-style Fix renames Address comments from Vinoth Fix tests and improve test coverage Refactor FsUtils to take metaClient and remove overloaded methods
84c72fd to
59cf106
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
Pushed one last change.. Tests pass, then I am good with the PR.
@danny0405 to make a pass and shepherd/land this.
@vinothchandar, thanks for your reply. Unfortunately, the RFC doesn't address mentioned questions. My main concern is performance, I really hope we won’t lose it after these changes. Since there are no performance tests in the project, we’ll have to verify it manually. |
|
@geserdugarov what is the performance concern that it may induces in this PR? We should definitely fix it IMO, can you guide me the lines in this patch? In general, all the changes in this path should not bring in performance issues, some hook/empty invocations shoud be fine and the cost could be negligible. I agree we should limit the code complexity, it's the way we are evolving the project, we welcome new features always if it is valuable, then try to make the code in good shape as much as we can, but yeah, sometimes it is hard to give a elegant solution based on the amout of work it could take. Appreciate your points. |
@danny0405 , from quick checking of this patch, you're right, looks like there should be no performance issues. I only see that we started to pass |
@geserdugarov I also noticed this change, in most of the changes, the context already contains a meta client, and we just need it's light-weight member like "table format", the other logic are almost the same before the patch. |
+1. this is a great idea in general.. for this PR, we had a bunch of conversations around this. and @bvaradar explicitly assured me and ensured that we ll not incur overheads on native path. @danny0405 I plugged the one issue I saw around extra timeline calls. CI is green. I'll land this in few hours, until I hear otherwise. |
…pache#13216) * introduces a new table option 'hoodie.table.format'; * hoodie's native format just triggers nothing on core actions like #commit, #archive, #rollback, #clean, etc. --------- Co-authored-by: vinoth chandar <vinoth@apache.org> Co-authored-by: danny0405 <yuzhao.cyz@gmail.com>
Change Logs
[HUDI-9332] Pluggable Table Format Support with native Integration
Impact
Initial Implementation of RFC - 93
Risk level (write none, low medium or high below)
none
Documentation Update
none
Contributor's checklist