Skip to content

df operation in Append mode for a MANAGED table results in "location already exists" error #13931

@mansipp

Description

@mansipp

Bug Description

What happened:
Running df.saveAsTable operations in Append mode for a MANAGED table results in "Can not create the managed table('$catalogTableName') The associated location('$tableLocation') already exists".
https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala#L280-L281

What you expected:
Append mode should work for managed tables.

Steps to reproduce:

import org.apache.hudi.DataSourceWriteOptions
import org.apache.spark.sql.SaveMode

val df1 = Seq(
  ("100", "2015-01-01", "event_name_900", "2015-01-01T13:51:39.340396Z", "type1"),
  ("101", "2015-01-01", "event_name_546", "2015-01-01T12:14:58.597216Z", "type2")
).toDF("event_id", "event_date", "event_name", "event_ts", "event_type")

val tableName = "<table_name>"
val databaseName = "<db_name>"

df1.write.format("hudi")
  .option("hoodie.metadata.enable", "true")
  .option("hoodie.table.name", tableName)
  .option("hoodie.database.name", databaseName)
  .option("hoodie.datasource.write.operation", "upsert")
  .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
  .option("hoodie.datasource.write.recordkey.field", "event_id")
  .option("hoodie.datasource.write.precombine.field", "event_ts")
  .option("hoodie.datasource.write.keygenerator.class", "org.apache.hudi.keygen.NonpartitionedKeyGenerator")
  .option("hoodie.datasource.hive_sync.enable", "true")
  .option("hoodie.datasource.meta.sync.enable", "true")
  .option("hoodie.datasource.hive_sync.mode", "hms")
  .option("hoodie.datasource.hive_sync.database", databaseName)
  .option("hoodie.datasource.hive_sync.table", tableName)
  .mode(SaveMode.Append)
  .saveAsTable(s"$databaseName.$tableName")

Environment

Hudi version: 1.0.2
Query engine: (Spark/Flink/Trino etc) Spark
Relevant configs:

Logs and Stack Trace

org.apache.spark.sql.AnalysisException: Can not create the managed table('spark_catalog.<db_nam>.<table_name>'). The associated location('<PATH>/<table_name>') already exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions