[HUDI-3826] Make truncate partition use delete_partition operation#5272
[HUDI-3826] Make truncate partition use delete_partition operation#5272xushiyan merged 12 commits intoapache:masterfrom
Conversation
| val partitionsToTruncate = normalizedSpec.map { spec => | ||
| hoodieCatalogTable.partitionFields.map { partitionColumn => | ||
| if (enableEncodeUrl) { | ||
| partitionColumn + "=" + "\"" + spec(partitionColumn) + "\"" |
There was a problem hiding this comment.
this encode case did not call PartitionPathEncodeUtils.escapePathName?
There was a problem hiding this comment.
yeah. not sure if he is confusing w/ hive style partitioning. Don't we need to consider both? i.e. url encode and hive style partitioning ?
snippet from KeyGenUtils
if (encodePartitionPath) {
partitionPath = PartitionPathEncodeUtils.escapePathName(partitionPath);
}
if (hiveStylePartitioning) {
partitionPath = partitionPathField + "=" + partitionPath;
}
There was a problem hiding this comment.
this encode case did not call
PartitionPathEncodeUtils.escapePathName?
The urlcode character appears in the partition, which cannot be deleted with single quotation marks. Double quotation marks are used after url decoding.
There was a problem hiding this comment.
hive style partitioning
Contains the processing of hive style partitioning, which is mainly to construct the where condition of delete sql.
There was a problem hiding this comment.
I see w/ latest commit, all changes in HoodieSqlCommonutils is reverted. So, where exactly we process url decoding ?
There was a problem hiding this comment.
guess getPartitionPathToDrop() does that.
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
Outdated
Show resolved
Hide resolved
| df.write.format("hudi") | ||
| .option(HoodieWriteConfig.TBL_NAME.key, tableName) | ||
| .option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL) | ||
| .option(TABLE_TYPE.key, COW_TABLE_TYPE_OPT_VAL) |
There was a problem hiding this comment.
I wonder how the tests are succeeding? bcoz, w/ latest master, delete partitions are lazy. only after cleaner gets a chance to clean up, the deleted partition may not show up when we make getAllPartitions(). Can you check the assertions in tests. Or are we asserting for records in the deleted partitions = 0 ?
There was a problem hiding this comment.
why this test case change?
This case encountered this #5282 that was not covered by ut before.
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala
Outdated
Show resolved
Hide resolved
|
@XuQianJin-Stars in |
|
yes. lets try to fix AlterHoodieTableDropPartitionCommand in the same patch as well. |
vinothchandar
left a comment
There was a problem hiding this comment.
For DROP TABLE and TRUNCATE TABLE (esp the latter), we want to probably do a simple fs.delete of the entire thing.
for partition level, drop/truncate, we can use a DELETE_PARTITION write operation.
...ark-common/src/main/scala/org/apache/spark/sql/hudi/command/TruncateHoodieTableCommand.scala
Outdated
Show resolved
Hide resolved
...source/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestAlterTableDropPartition.scala
Show resolved
Hide resolved
|
@XuQianJin-Stars please update PR description to follow the format |
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.