[HUDI-5317] Fix insert overwrite table for partitioned table#7365
[HUDI-5317] Fix insert overwrite table for partitioned table#7365leesf merged 3 commits intoapache:masterfrom
Conversation
|
@leesf Could you please help to review this PR? |
|
@stream2000 would you please check CI failure? |
|
@hudi-bot run azure |
Seems like some uts were failed. Will fix it |
|
we have two operations relating to insert_overwrite. spark-ds writes supports both operations. guess in spark-sql, we supported only insert_overwrite. not sure if we can revert the behavior. May be we should consider adding a new write operation in spark-sql for this. |
@nsivabalan hi, here are my two cents: |
424e8f0 to
e58d4db
Compare
|
@hudi-bot run azure |
f10a71c to
d3ab1e3
Compare
d3ab1e3 to
b879347
Compare
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
b4a9d1a to
c963634
Compare
c963634 to
9f01927
Compare
9f01927 to
9e1f64f
Compare
Change Logs
For sql like insert overwrite table $table select xxx, we expect to drop all data in the table first and then insert the selected data into it. But we found that the 'insert overwrite table' semantic works only for non-partitioned table. For partitioned table, current implementation will drop only partitions involved in the select sub-query, other partitions won't be dropped( which should be dropped as expected).
This pr to solve the problem that insert overwrite table can drop all partitions at first then insert new data.
Impact
Insert overwrite table will drop all partitions at first then insert new data.
Risk level (write none, low medium or high below)
None
Documentation Update
None
Contributor's checklist