Skip to content

[SUPPORT] DELETE_PARTITION causes AWS Athena Query failure #6024

@Gatsby-Lee

Description

@Gatsby-Lee

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. DELETE_PARTITION for non-existing partition ( e.g. org_id=55555 )
  • since it will raise an exception, you have to wrap the Spark Write.
  • this operation will creates org_id=55555_\$folder$ in Hudi Table Path ( BTW, why is it even created? )
  1. UPSERT to other partition ( e.g. org_id=24 )
  • Check the current status
  • you will see org_id=55555 partition is in Glue Catalog
  1. Go to Athena / Run Query
  • you will see that the query will fail due to the missing path org_id=55555 in S3

Expected behavior

org_id=55555 MUST not be registered to Catalog

Environment Description

  • Hudi version : 0.10.1
  • Spark version : 3.1.1-amzn-0
  • Hive version : 2.3.7-amzn-4
  • Hadoop version : 3.2.1-amzn-3
  • Storage (HDFS/S3/GCS..) : S3
  • Running on Docker? (yes/no) : NO

Metadata

Metadata

Assignees

Labels

area:awsAWS ecosystem supportarea:query-engineQuery engine integrationspriority:criticalProduction degraded; pipelines stalled

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions