[HUDI-8470] Remove auto commit support in WriteClient#13229
[HUDI-8470] Remove auto commit support in WriteClient#13229danny0405 merged 18 commits intoapache:masterfrom
Conversation
nsivabalan
left a comment
There was a problem hiding this comment.
Good job on chasing all the usages.
left few minor comments.
can you TAL.
Also, can you update PR description
...nt/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/apache/hudi/client/functional/TestHoodieJavaClientOnCopyOnWriteStorage.java
Outdated
Show resolved
Hide resolved
...nt/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java
Outdated
Show resolved
Hide resolved
...i-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
Outdated
Show resolved
Hide resolved
...tasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSourceStorage.scala
Outdated
Show resolved
Hide resolved
...tasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSourceStorage.scala
Outdated
Show resolved
Hide resolved
...tasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSourceStorage.scala
Outdated
Show resolved
Hide resolved
.../hudi-spark/src/test/scala/org/apache/spark/sql/hudi/feature/index/TestExpressionIndex.scala
Outdated
Show resolved
Hide resolved
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java
Show resolved
Hide resolved
|
and lets chase for test failures and fix them. |
|
Also, can you chase all tests and remove usages of "withAutoCommit" and "shouldAutoCommit". |
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
...lient-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
Outdated
Show resolved
Hide resolved
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/data/HoodieListPairData.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/data/HoodiePairData.java
Outdated
Show resolved
Hide resolved
...lient/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDTableServiceClient.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
Show resolved
Hide resolved
|
here is the link to bootstrap Here we are directly executing CommitActionExecutor and expecting the commit to complete. Since we don't have a writeClient instance, I could not call commit() explicitly. And hence had to keep the auto commit flow by means of "internal auto commit". |
5750952 to
fb5a259
Compare
|
@yihua : all comments are addressed. |
Co-authored-by: sivabalan <n.siva.b@gmail.com>
yihua
left a comment
There was a problem hiding this comment.
LGTM overall. Having nits on test code changes which can be addressed separately if needed.
...rce/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
Outdated
Show resolved
Hide resolved
| HoodieWriteConfig writeConfig = getConfigBuilder(HoodieFailedWritesCleaningPolicy.EAGER) | ||
| .withCompactionConfig(HoodieCompactionConfig.newBuilder().withMaxNumDeltaCommitsBeforeCompaction(2) | ||
| .withInlineCompaction(true) | ||
| .withCompactionConfig(HoodieCompactionConfig.newBuilder().withMaxNumDeltaCommitsBeforeCompaction(3) |
There was a problem hiding this comment.
Follow up to check this test behavior change
There was a problem hiding this comment.
...lient-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
Outdated
Show resolved
Hide resolved
| private List<WriteStatus> writeBatchHelper(HoodieJavaWriteClient client, String newCommitTime, String prevCommitTime, | ||
| Option<List<String>> commitTimesBetweenPrevAndNew, String initCommitTime, | ||
| int numRecordsInThisCommit, List<HoodieRecord> records, | ||
| Function3<List<WriteStatus>, HoodieJavaWriteClient, List<HoodieRecord>, String> writeFn, | ||
| boolean assertForCommit, int expRecordsInThisCommit, int expTotalRecords, | ||
| int expTotalCommits, boolean filterForCommitTimeWithAssert, InstantGenerator instantGenerator) throws IOException { | ||
| return writeBatchHelper(client, newCommitTime, prevCommitTime, commitTimesBetweenPrevAndNew, initCommitTime, | ||
| numRecordsInThisCommit, records, writeFn, assertForCommit, expRecordsInThisCommit, expTotalRecords, expTotalCommits, | ||
| filterForCommitTimeWithAssert, instantGenerator, false); |
There was a problem hiding this comment.
nit: we have too many such private utils that confuse developers. We should clean them up and only keep one or two in a follow-up.
There was a problem hiding this comment.
...lient/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java
Show resolved
Hide resolved
...di-spark/src/test/java/org/apache/hudi/functional/TestGlobalIndexEnableUpdatePartitions.java
Outdated
Show resolved
Hide resolved
...di-spark/src/test/java/org/apache/hudi/functional/TestGlobalIndexEnableUpdatePartitions.java
Show resolved
Hide resolved
...park-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
Outdated
Show resolved
Hide resolved
...tasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSourceStorage.scala
Outdated
Show resolved
Hide resolved
| .withSchema(TRIP_EXAMPLE_SCHEMA) | ||
| .withParallelism(2, 2) | ||
| .withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build()) | ||
| .withAutoCommit(false) |
There was a problem hiding this comment.
nit: Check if auto commit disabled (compared to previous enabled as default) has any implication in tests.
There was a problem hiding this comment.
|
https://issues.apache.org/jira/browse/HUDI-9440 |
CI report:
Bot commands@hudi-bot supports the following commands:
|
* for #compact and #log_compact, we return a HoodieWriteMetadata instead of list of write status; * now user needs to execute the action first then an explicit #commit --------- Co-authored-by: sivabalan <n.siva.b@gmail.com>
Change Logs
We have two different flows to write commits using writeClient. either w/ auto commit enabled or as auto commit disabled.
All the user facing writers (spark batch writers, spark streaming writers) are using auto commit disabled flow. The PR deprecate the other flow and only allow auto commit disabled flow.
We have introduced Spark/FlinkAutoCommitActionExecutor to support auto commits. So, only very few callers might call into this if they are looking for auto commit flows (bootstrap, PartitionTTL).
Note on Table Services:
We have made minor refactoring wrt table services. Prior to this patch, HoodieCommitMetadata is prepared within internal layers (RunCompactionActionExecutor) and HoodieCommitMetadata is preapared and attached to HoodieWriteMetadata.
And there was a bug where in, w/ auto commit disabled flow, we should not be triggering the dag in RunCompactionActionExecutor.
But w/ this auto commit deprecation, the dag is not expected to be triggered anyways.
So, in this patch, we have refactored this a bit. Partial HoodieCommitMetadata will be prepared in inner layers. But the HoodieWriteStat and PartitionToReplaceFileIds will be stitched within commitCompaction or commitClustering method in BaseHoodieTableServiceClient. So, the dag is triggered at the beginning of commitCompaction, following which HoodieCommitMetadata will stitched w/ the right HoodieWriteStats. This is done for all 3 table services (compaction, log compaction and clustering)
In other words, none of the inner classes like RunCompactionActionExecutor will trigger the dag nor need to set List.
Impact
Deprecates write config
hoodie.auto.commit. The config would no longer be available for use.Risk level (write none, low medium or high below)
low
Documentation Update
PR deprecates write config
hoodie.auto.commit. The config would no longer be available for use. If user is using write client operations like upsert, insert or table service operations directly using hoodie write client, then the corresponding commit operation needs to be performed.Contributor's checklist