Skip to content

[HUDI-3826] Make truncate partition use delete_partition operation#5272

Merged
xushiyan merged 12 commits intoapache:masterfrom
XuQianJin-Stars:HUDI-3826
Apr 14, 2022
Merged

[HUDI-3826] Make truncate partition use delete_partition operation#5272
xushiyan merged 12 commits intoapache:masterfrom
XuQianJin-Stars:HUDI-3826

Conversation

@XuQianJin-Stars
Copy link
Contributor

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@XuQianJin-Stars XuQianJin-Stars changed the title [WIP][HUDI-3826] Commands deleting partitions do so incorrectly [HUDI-3826] Commands deleting partitions do so incorrectly Apr 10, 2022
@xushiyan xushiyan self-assigned this Apr 12, 2022
@xushiyan xushiyan added the priority:blocker Production down; release blocker label Apr 12, 2022
val partitionsToTruncate = normalizedSpec.map { spec =>
hoodieCatalogTable.partitionFields.map { partitionColumn =>
if (enableEncodeUrl) {
partitionColumn + "=" + "\"" + spec(partitionColumn) + "\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this encode case did not call PartitionPathEncodeUtils.escapePathName?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. not sure if he is confusing w/ hive style partitioning. Don't we need to consider both? i.e. url encode and hive style partitioning ?

snippet from KeyGenUtils

    if (encodePartitionPath) {
      partitionPath = PartitionPathEncodeUtils.escapePathName(partitionPath);
    }
    if (hiveStylePartitioning) {
      partitionPath = partitionPathField + "=" + partitionPath;
    }

Copy link
Contributor Author

@XuQianJin-Stars XuQianJin-Stars Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this encode case did not call PartitionPathEncodeUtils.escapePathName?

The urlcode character appears in the partition, which cannot be deleted with single quotation marks. Double quotation marks are used after url decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive style partitioning

Contains the processing of hive style partitioning, which is mainly to construct the where condition of delete sql.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see w/ latest commit, all changes in HoodieSqlCommonutils is reverted. So, where exactly we process url decoding ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess getPartitionPathToDrop() does that.

df.write.format("hudi")
.option(HoodieWriteConfig.TBL_NAME.key, tableName)
.option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL)
.option(TABLE_TYPE.key, COW_TABLE_TYPE_OPT_VAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this test case change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the tests are succeeding? bcoz, w/ latest master, delete partitions are lazy. only after cleaner gets a chance to clean up, the deleted partition may not show up when we make getAllPartitions(). Can you check the assertions in tests. Or are we asserting for records in the deleted partitions = 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this test case change?

This case encountered this #5282 that was not covered by ut before.

@xushiyan
Copy link
Member

xushiyan commented Apr 12, 2022

@XuQianJin-Stars in org.apache.spark.sql.hudi.command.AlterHoodieTableDropPartitionCommand we should also fix the if (purge) case to use delete API too. can you include the change in the PR? also please make the PR title more descriptive. thanks.
cc @alexeykudinkin @nsivabalan

@nsivabalan
Copy link
Contributor

yes. lets try to fix AlterHoodieTableDropPartitionCommand in the same patch as well.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DROP TABLE and TRUNCATE TABLE (esp the latter), we want to probably do a simple fs.delete of the entire thing.

for partition level, drop/truncate, we can use a DELETE_PARTITION write operation.

@alexeykudinkin
Copy link
Contributor

@XuQianJin-Stars please update PR description to follow the format

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan changed the title [HUDI-3826] Commands deleting partitions do so incorrectly [HUDI-3826] Make truncate partition use delete_partition operation Apr 14, 2022
@xushiyan xushiyan merged commit 44b3630 into apache:master Apr 14, 2022
xushiyan pushed a commit that referenced this pull request Apr 14, 2022
…5272)

Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants