-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 #4490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,8 @@ import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec | |
| import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable | ||
| import org.apache.spark.sql.execution.command.TruncateTableCommand | ||
|
|
||
| import scala.util.control.NonFatal | ||
|
|
||
| /** | ||
| * Command for truncate hudi table. | ||
| */ | ||
|
|
@@ -36,10 +38,16 @@ class TruncateHoodieTableCommand( | |
| override def run(sparkSession: SparkSession): Seq[Row] = { | ||
| val hoodieCatalogTable = HoodieCatalogTable(sparkSession, tableIdentifier) | ||
| val properties = hoodieCatalogTable.tableConfig.getProps | ||
| val tablePath = hoodieCatalogTable.tableLocation | ||
|
|
||
| // Delete all data in the table directory | ||
| super.run(sparkSession) | ||
| try { | ||
| // Delete all data in the table directory | ||
| super.run(sparkSession) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it will throw exception and cause failure?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to the process, call the The related error as following:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @YannByron looks like it deserves a log statement or comment in the catch block to help explain the try catch scenario?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done~ |
||
| } catch { | ||
| // TruncateTableCommand will delete the related directories first, and then refresh the table. | ||
| // It will fail when refresh table, because the hudi meta directory(.hoodie) has been deleted at the first step. | ||
| // So here ignore this failure, and refresh table later. | ||
| case NonFatal(_) => | ||
| } | ||
|
|
||
| // If we have not specified the partition, truncate will delete all the data in the table path | ||
| // include the hoodi.properties. In this case we should reInit the table. | ||
|
|
@@ -50,6 +58,10 @@ class TruncateHoodieTableCommand( | |
| .fromProperties(properties) | ||
| .initTable(hadoopConf, hoodieCatalogTable.tableLocation) | ||
| } | ||
|
|
||
| // After deleting the data, refresh the table to make sure we don't keep around a stale | ||
| // file relation in the metastore cache and cached table data in the cache manager. | ||
| sparkSession.catalog.refreshTable(hoodieCatalogTable.table.identifier.quotedString) | ||
| Seq.empty[Row] | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -154,7 +154,7 @@ class TestShowPartitions extends TestHoodieSqlBase { | |
| Seq("year=2021/month=02/day=default"), | ||
| Seq("year=2021/month=02/day=01") | ||
| ) | ||
| checkAnswer(s"show partitions $tableName partition(day=01)")( | ||
| checkAnswer(s"show partitions $tableName partition(day='01')")( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here partition must be in string format?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's related to the origin type of the field. In this case, |
||
| Seq("year=2021/month=02/day=01"), | ||
| Seq("year=2021/month=default/day=01"), | ||
| Seq("year=2021/month=01/day=01"), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with #4169