[HUDI-5317] Fix insert overwrite table for partitioned table by stream2000 · Pull Request #7365 · apache/hudi

stream2000 · 2022-12-02T09:27:39Z

Change Logs

For sql like insert overwrite table $table select xxx, we expect to drop all data in the table first and then insert the selected data into it. But we found that the 'insert overwrite table' semantic works only for non-partitioned table. For partitioned table, current implementation will drop only partitions involved in the select sub-query, other partitions won't be dropped( which should be dropped as expected).

This pr to solve the problem that insert overwrite table can drop all partitions at first then insert new data.

Impact

Insert overwrite table will drop all partitions at first then insert new data.

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

stream2000 · 2022-12-02T09:28:12Z

@leesf Could you please help to review this PR?

leesf · 2022-12-03T10:18:51Z

@stream2000 would you please check CI failure?

leesf · 2022-12-03T10:19:03Z

@hudi-bot run azure

stream2000 · 2022-12-03T14:16:39Z

@stream2000 would you please check CI failure?

Seems like some uts were failed. Will fix it

nsivabalan · 2022-12-05T23:44:29Z

we have two operations relating to insert_overwrite.
1: insert_overwrite_table
2: insert_overwrite.

spark-ds writes supports both operations.
insert_overwrite_table will override entire table. while insert_overwrite will overwrite only matching partitions.

guess in spark-sql, we supported only insert_overwrite. not sure if we can revert the behavior. May be we should consider adding a new write operation in spark-sql for this.

leesf · 2022-12-09T01:15:12Z

we have two operations relating to insert_overwrite. 1: insert_overwrite_table 2: insert_overwrite.

spark-ds writes supports both operations. insert_overwrite_table will override entire table. while insert_overwrite will overwrite only matching partitions.

guess in spark-sql, we supported only insert_overwrite. not sure if we can revert the behavior. May be we should consider adding a new write operation in spark-sql for this.

@nsivabalan hi, here are my two cents: insert overwrite xxx values(xx,xxx) has very clear semantics, it means overwrite the entire table, insert overwrite xx partition(xx) values(xx,xxx) means insert overwrite partitions, but hudi handles overwrite partitions for overwrite table, which is a definite bug and i do not think we need to introduce a new operation for it.

stream2000 · 2022-12-14T06:47:34Z

@hudi-bot run azure

leesf

LGTM

stream2000 · 2023-01-11T06:23:18Z

@hudi-bot run azure

leesf · 2023-01-11T07:29:21Z

@hudi-bot run azure

hudi-bot · 2023-01-12T08:00:49Z

CI report:

c963634 UNKNOWN
9e1f64f Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

…7365)

leesf self-assigned this Dec 2, 2022

leesf closed this Dec 3, 2022

leesf reopened this Dec 3, 2022

nsivabalan added priority:blocker Production down; release blocker release-0.12.2 Patches targetted for 0.12.2 labels Dec 5, 2022

codope added priority:critical Production degraded; pipelines stalled area:sql SQL interfaces and removed priority:blocker Production down; release blocker release-0.12.2 Patches targetted for 0.12.2 labels Dec 7, 2022

stream2000 force-pushed the fix_insert_overwrite_table branch from 424e8f0 to e58d4db Compare December 14, 2022 06:10

stream2000 force-pushed the fix_insert_overwrite_table branch 5 times, most recently from f10a71c to d3ab1e3 Compare December 29, 2022 02:31

stream2000 force-pushed the fix_insert_overwrite_table branch from d3ab1e3 to b879347 Compare January 9, 2023 07:56

leesf approved these changes Jan 10, 2023

View reviewed changes

stream2000 closed this Jan 11, 2023

stream2000 reopened this Jan 11, 2023

stream2000 force-pushed the fix_insert_overwrite_table branch from b4a9d1a to c963634 Compare January 12, 2023 02:34

[HUDI-5317] Fix insert overwrite table for partitioned table

bdfe71b

fix ut not pass for spark3.1

c1079b9

stream2000 force-pushed the fix_insert_overwrite_table branch from c963634 to 9f01927 Compare January 12, 2023 02:41

fix ut not pass for spark3.3

9e1f64f

stream2000 force-pushed the fix_insert_overwrite_table branch from 9f01927 to 9e1f64f Compare January 12, 2023 02:43

leesf merged commit d656686 into apache:master Jan 12, 2023

Zouxxyy mentioned this pull request Jan 30, 2023

[HUDI-5317] Fix insert overwrite table for partitioned table #7793

Merged

4 tasks

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Jan 31, 2023

[HUDI-5317] Fix insert overwrite table for partitioned table (apache#…

c7d4686

…7365)

stream2000 mentioned this pull request Feb 22, 2023

[HUDI-5317] Fix insert overwrite table for partitioned table #8015

Closed

4 tasks

KnightChess mentioned this pull request Mar 24, 2023

[SUPPORT] In version 0.13.0, when using dynamic partition to insert overwrite data, the table will be cleared first, and then the corresponding partition data will be written. Is it not as expected? Why clean the table first? #8283

Closed

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023

[HUDI-5317] Fix insert overwrite table for partitioned table (apache#…

3f3789a

…7365)

flashJd mentioned this pull request Jul 3, 2023

[HUDI-6466] Fix spark insert overwrite partitioned table with dynamic partition #9113

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5317] Fix insert overwrite table for partitioned table#7365

[HUDI-5317] Fix insert overwrite table for partitioned table#7365
leesf merged 3 commits intoapache:masterfrom
stream2000:fix_insert_overwrite_table

stream2000 commented Dec 2, 2022 •

edited

Loading

Uh oh!

stream2000 commented Dec 2, 2022

Uh oh!

leesf commented Dec 3, 2022

Uh oh!

leesf commented Dec 3, 2022

Uh oh!

stream2000 commented Dec 3, 2022

Uh oh!

nsivabalan commented Dec 5, 2022

Uh oh!

leesf commented Dec 9, 2022

Uh oh!

stream2000 commented Dec 14, 2022

Uh oh!

leesf left a comment

Uh oh!

stream2000 commented Jan 11, 2023

Uh oh!

leesf commented Jan 11, 2023

Uh oh!

hudi-bot commented Jan 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

stream2000 commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

stream2000 commented Dec 2, 2022

Uh oh!

leesf commented Dec 3, 2022

Uh oh!

leesf commented Dec 3, 2022

Uh oh!

stream2000 commented Dec 3, 2022

Uh oh!

nsivabalan commented Dec 5, 2022

Uh oh!

leesf commented Dec 9, 2022

Uh oh!

stream2000 commented Dec 14, 2022

Uh oh!

leesf left a comment

Choose a reason for hiding this comment

Uh oh!

stream2000 commented Jan 11, 2023

Uh oh!

leesf commented Jan 11, 2023

Uh oh!

hudi-bot commented Jan 12, 2023

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stream2000 commented Dec 2, 2022 •

edited

Loading