Skip to content

[HUDI-8470] Remove auto commit support in WriteClient#13229

Merged
danny0405 merged 18 commits intoapache:masterfrom
lokeshj1703:HUDI-8470
May 23, 2025
Merged

[HUDI-8470] Remove auto commit support in WriteClient#13229
danny0405 merged 18 commits intoapache:masterfrom
lokeshj1703:HUDI-8470

Conversation

@lokeshj1703
Copy link
Collaborator

@lokeshj1703 lokeshj1703 commented Apr 28, 2025

Change Logs

We have two different flows to write commits using writeClient. either w/ auto commit enabled or as auto commit disabled.

All the user facing writers (spark batch writers, spark streaming writers) are using auto commit disabled flow. The PR deprecate the other flow and only allow auto commit disabled flow.

We have introduced Spark/FlinkAutoCommitActionExecutor to support auto commits. So, only very few callers might call into this if they are looking for auto commit flows (bootstrap, PartitionTTL).

Note on Table Services:
We have made minor refactoring wrt table services. Prior to this patch, HoodieCommitMetadata is prepared within internal layers (RunCompactionActionExecutor) and HoodieCommitMetadata is preapared and attached to HoodieWriteMetadata.

And there was a bug where in, w/ auto commit disabled flow, we should not be triggering the dag in RunCompactionActionExecutor.

But w/ this auto commit deprecation, the dag is not expected to be triggered anyways.
So, in this patch, we have refactored this a bit. Partial HoodieCommitMetadata will be prepared in inner layers. But the HoodieWriteStat and PartitionToReplaceFileIds will be stitched within commitCompaction or commitClustering method in BaseHoodieTableServiceClient. So, the dag is triggered at the beginning of commitCompaction, following which HoodieCommitMetadata will stitched w/ the right HoodieWriteStats. This is done for all 3 table services (compaction, log compaction and clustering)
In other words, none of the inner classes like RunCompactionActionExecutor will trigger the dag nor need to set List.

Impact

Deprecates write config hoodie.auto.commit. The config would no longer be available for use.

Risk level (write none, low medium or high below)

low

Documentation Update

PR deprecates write config hoodie.auto.commit. The config would no longer be available for use. If user is using write client operations like upsert, insert or table service operations directly using hoodie write client, then the corresponding commit operation needs to be performed.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Apr 28, 2025
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on chasing all the usages.
left few minor comments.
can you TAL.

Also, can you update PR description

@nsivabalan nsivabalan changed the title [HUDI-8470][DNM] Unify/Fix auto commit enabled and disabled flows [HUDI-8470] Remove auto commit support WriteClient Apr 28, 2025
@nsivabalan
Copy link
Contributor

and lets chase for test failures and fix them.

@nsivabalan
Copy link
Contributor

Also, can you chase all tests and remove usages of "withAutoCommit" and "shouldAutoCommit".

@nsivabalan nsivabalan changed the title [HUDI-8470] Remove auto commit support WriteClient [HUDI-8470] Remove auto commit support in WriteClient Apr 29, 2025
@nsivabalan nsivabalan self-assigned this Apr 29, 2025
@nsivabalan nsivabalan marked this pull request as ready for review May 2, 2025 10:10
Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block on my review

@nsivabalan
Copy link
Contributor

@danny0405 :

here is the link to bootstrap

Option<HoodieWriteMetadata<HoodieData<WriteStatus>>> writeMetadataOption =

Here we are directly executing CommitActionExecutor and expecting the commit to complete. Since we don't have a writeClient instance, I could not call commit() explicitly. And hence had to keep the auto commit flow by means of "internal auto commit".

@nsivabalan nsivabalan force-pushed the HUDI-8470 branch 2 times, most recently from 5750952 to fb5a259 Compare May 5, 2025 01:49
@nsivabalan
Copy link
Contributor

@yihua : all comments are addressed.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Having nits on test code changes which can be addressed separately if needed.

HoodieWriteConfig writeConfig = getConfigBuilder(HoodieFailedWritesCleaningPolicy.EAGER)
.withCompactionConfig(HoodieCompactionConfig.newBuilder().withMaxNumDeltaCommitsBeforeCompaction(2)
.withInlineCompaction(true)
.withCompactionConfig(HoodieCompactionConfig.newBuilder().withMaxNumDeltaCommitsBeforeCompaction(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up to check this test behavior change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +722 to +730
private List<WriteStatus> writeBatchHelper(HoodieJavaWriteClient client, String newCommitTime, String prevCommitTime,
Option<List<String>> commitTimesBetweenPrevAndNew, String initCommitTime,
int numRecordsInThisCommit, List<HoodieRecord> records,
Function3<List<WriteStatus>, HoodieJavaWriteClient, List<HoodieRecord>, String> writeFn,
boolean assertForCommit, int expRecordsInThisCommit, int expTotalRecords,
int expTotalCommits, boolean filterForCommitTimeWithAssert, InstantGenerator instantGenerator) throws IOException {
return writeBatchHelper(client, newCommitTime, prevCommitTime, commitTimesBetweenPrevAndNew, initCommitTime,
numRecordsInThisCommit, records, writeFn, assertForCommit, expRecordsInThisCommit, expTotalRecords, expTotalCommits,
filterForCommitTimeWithAssert, instantGenerator, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have too many such private utils that confuse developers. We should clean them up and only keep one or two in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.withSchema(TRIP_EXAMPLE_SCHEMA)
.withParallelism(2, 2)
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build())
.withAutoCommit(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Check if auto commit disabled (compared to previous enabled as default) has any implication in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan
Copy link
Contributor

https://issues.apache.org/jira/browse/HUDI-9440
added a tracking jira for all minor test follow up. @lokeshj1703 will take it up.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 6b60eac into apache:master May 23, 2025
58 checks passed
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Aug 25, 2025
* for #compact and #log_compact, we return a HoodieWriteMetadata instead of list of write status;
* now user needs to execute the action first then an explicit #commit

---------

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants