[HUDI-9248] Unify code paths for all write operations about bulk_insert#13360
[HUDI-9248] Unify code paths for all write operations about bulk_insert#13360danny0405 merged 3 commits intoapache:masterfrom
Conversation
1. Unify all the code paths of bulk insert operations Signed-off-by: TheR1sing3un <chaoyang@apache.org>
1. fix ut Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…it metadata 1. pass the extra metadata about spark-streaming checkpoint to commit metadata Signed-off-by: TheR1sing3un <chaoyang@apache.org>
6f93787 to
791ef28
Compare
|
@hudi-bot run azure |
|
All checks passed |
|
@danny0405 @zhangyue19921010 Hi, Danny, Yue, I close the previous pr: #13066 and reopen it in this pr. The changes to this pr were made based on the suggestions in your last review. How about starting the review again? |
|
|
||
| } | ||
| } | ||
| val (writeSuccessful, compactionInstant, clusteringInstant) = commitAndPerformPostOperations( |
There was a problem hiding this comment.
is the change because of overwriteOperationType never null?
There was a problem hiding this comment.
| + " To use row writer please switch to spark 3"); | ||
| } | ||
|
|
||
| records.write().format(targetFormat) |
There was a problem hiding this comment.
can you elaborate why logic is customized before for this executor?
There was a problem hiding this comment.
can you elaborate why logic is customized before for this executor?
I think the timeline is like this.
First, there was a normal bulk insert logic, and at that time, the interface of data source v2 was directly used to perform writes.
Later, boneanxs proposed to use bulk insert to perform other operations such as overwrite, but the code path was not integrated at that time. Instead, the logic of this part was retained.
You can refer to: #8076

…rt (apache#13360) * refactor: Unify all the code paths of bulk insert operations --------- Signed-off-by: TheR1sing3un <chaoyang@apache.org>

refactor: Unify code paths for all bulk_Insert to improve code readability and maintainability
Change Logs
Impact
Improve code maintainability
Risk level (write none, low medium or high below)
low
Documentation Update
none
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist