Skip to content

Conversation

@fhan688
Copy link
Contributor

@fhan688 fhan688 commented Nov 12, 2024

Change Logs

The current implementation missed database when building hudi config, which will lead to obtain the incorrect table even though the table exists. Such as the case of spark writing data

val config = buildHoodieInsertConfig(catalogTable, sparkSession, isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, staticOverwritePartitionPathOpt)

val databaseName = hoodieConfig.getStringOrDefault(HoodieTableConfig.DATABASE_NAME, "")
val tblName = hoodieConfig.getStringOrThrow(HoodieWriteConfig.TBL_NAME,
s"'${HoodieWriteConfig.TBL_NAME.key}' must be set.").trim
val tableIdentifier = TableIdentifier(tblName, if (databaseName.isEmpty) None else Some(databaseName))

Because the value of DATABASE_NAME.key is lost when buildHoodieInsertConfig, so databaseName is empty in tableIdentifier.
getCatalogTable(spark, tableId).map { catalogTable =>
val (structName, namespace) = getAvroRecordNameAndNamespace(tableId.table)
convertStructTypeToAvroSchema(catalogTable.schema, structName, namespace)
}

private def getCatalogTable(spark: SparkSession, tableId: TableIdentifier): Option[CatalogTable] = {
if (spark.sessionState.catalog.tableExists(tableId)) {
Some(spark.sessionState.catalog.getTableMetadata(tableId))
} else {
None
}
}

Then the table under the databaseName cannot be obtained or the table under the default database is obtained, it is obviously inconsistent with expectation.

this PR fix this bug.

Impact

hudi-spark-common

Risk level (write none, low medium or high below)

Low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Nov 12, 2024
@danny0405
Copy link
Contributor

@fhan688 Thanks for the contribution, can you check the test failures.

@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] and removed size:XS PR with lines of changes in <= 10 labels Nov 13, 2024
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua changed the title [HUDI-8504] Fix missing database when building hudi config [HUDI-8504] Fix missing database config when building Hudi configs in Spark Nov 15, 2024
@yihua yihua merged commit 4b2a633 into apache:master Nov 15, 2024
vinishjail97 added a commit to vinishjail97/hudi that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants