Skip to content

[HUDI-5317] Fix insert overwrite table for partitioned table#7793

Merged
leesf merged 4 commits intoapache:masterfrom
Zouxxyy:xinyu/fix-insert-overwrite
Feb 1, 2023
Merged

[HUDI-5317] Fix insert overwrite table for partitioned table#7793
leesf merged 4 commits intoapache:masterfrom
Zouxxyy:xinyu/fix-insert-overwrite

Conversation

@Zouxxyy
Copy link
Copy Markdown
Contributor

@Zouxxyy Zouxxyy commented Jan 30, 2023

Change Logs

fix #7365

Test Insert Overwrite Table for V2 Table is unnecessary, it is completely consistent with Test Insert Overwrite Table, and their results must be the same in spark2 and spark3.

So I just removed it, and fix spark2 by repalce if (overwrite && catalogTable.partitionFields.isEmpty) with if (overwrite && partitionSpec.isEmpty)

Impact

low

Risk level (write none, low medium or high below)

low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@Zouxxyy
Copy link
Copy Markdown
Contributor Author

Zouxxyy commented Jan 30, 2023

@stream2000 @leesf Could you please help to review

@leesf leesf self-assigned this Jan 31, 2023
}
})
}
spark.sql(s"insert overwrite table $tblNonPartition select 3, 'a3', 10, 1000")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there test case for insert overwrite partitioned table?

Copy link
Copy Markdown
Contributor Author

@Zouxxyy Zouxxyy Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leesf exist, such as lines 383 and 393

Copy link
Copy Markdown
Contributor Author

@Zouxxyy Zouxxyy Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leesf Test Insert Overwrite Table for V2 Table is unnecessary, it is completely consistent with Test Insert Overwrite, and their results must be the same in spark2 and spark3.
So I just removed it, and fix spark2 by repalce if (overwrite && catalogTable.partitionFields.isEmpty) with if (overwrite && partitionSpec.isEmpty)

@danny0405 danny0405 added area:sql SQL interfaces priority:high Significant impact; potential bugs labels Jan 31, 2023
@danny0405
Copy link
Copy Markdown
Contributor

cc @jonvex and @lokeshj1703 to take a look :)

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

}

test("Test Insert Overwrite Table for V2 Table") {
withSQLConf("hoodie.schema.on.read.enable" -> "true") {
Copy link
Copy Markdown
Contributor

@stream2000 stream2000 Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 Table is only enabled when hoodie.schema.on.read.enable is true, otherwise v2Table.v1TableWrapper will be used(see org.apache.spark.sql.hudi.catalog.HoodieCatalog#loadTable). In V2 table, we can distinguish between insert overwrite partition and insert overwrite table while we can't do this in v1 table, so I add a v2 table test here to test the different behaviors between v1 and v2 table.

Copy link
Copy Markdown
Contributor Author

@Zouxxyy Zouxxyy Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year, I noticed that you added this config to force the use of the V2 table, but I think in the future, hudi spark3 may use v2 by default instead of being controlled by this config.
Beside, v1 table can also distinguish insert overwrite partition and insert overwrite table by checking partitionSpec is empty or not, so I think the test should be uniform.

@stream2000
Copy link
Copy Markdown
Contributor

LGTM

@leesf leesf merged commit 0a9a6d2 into apache:master Feb 1, 2023
@Zouxxyy Zouxxyy deleted the xinyu/fix-insert-overwrite branch February 1, 2023 14:50
@yihua yihua added priority:blocker Production down; release blocker and removed priority:high Significant impact; potential bugs labels Feb 5, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants