-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6132] Fixing checkpoint management for multiple streaming writers #8558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Minor refactoring comment.
| object HoodieStreamingSink { | ||
|
|
||
| // This constant serves as the checkpoint key for streaming sink so that each microbatch is processed exactly-once. | ||
| val SINK_CHECKPOINT_KEY = "_hudi_streaming_sink_checkpoint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to remove this object now? It's just holding a constant.
|
@codope : guess we can't define static variables within scala class and hence it has to go into the object. I would prefer to keep it in HoodieStreamingSink if you were suggesting to move it to some other class. since this is applicable only for streaming writes. |
|
Sounds good. I wanted to move to write config and unify the constants for deltastreamer and spark streaming. But, now I think it makes sense to keep it separate. |
…rs (apache#8558) Each writer updates the checkpoint in commit metadata with its own batchId info only. When checking to skip the current batch, we walk back in the timeline and find the current writer's last committed batchId. Also fixed bulk insert row writer path for checkpoint management with streaming writes.
…rs (apache#8558) Each writer updates the checkpoint in commit metadata with its own batchId info only. When checking to skip the current batch, we walk back in the timeline and find the current writer's last committed batchId. Also fixed bulk insert row writer path for checkpoint management with streaming writes.
…rs (apache#8558) Each writer updates the checkpoint in commit metadata with its own batchId info only. When checking to skip the current batch, we walk back in the timeline and find the current writer's last committed batchId. Also fixed bulk insert row writer path for checkpoint management with streaming writes.
Change Logs
Impact
Risk level (write none, low medium or high below)
low.
Documentation Update
N/A
Contributor's checklist