[HUDI-4432] Checkpoint management for muti-writer scenario #7383
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Logs
extraPreCommitFuncforBaseHoodieWriteClient.commitStatsto allow it execute customized functions(which is checking the checkpoint info and update it inHoodieStreamingSink)hoodie.datasource.write.streaming.checkpoint.identifierto identify each writer's checkpoint info, if not set, will hold an in-memory latestBatchId to avoid the issue [HUDI-4389] Make HoodieStreamingSink idempotent #6098Impact
Existing jobs which already write
_hudi_streaming_sink_checkpointmight lost old checkpoint info, as we use a user-provided identifier to get the checkpoint info, not${sqlContext.sparkContext.applicationId}-$queryIdRisk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist