[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 #4490

YannByron · 2022-01-02T04:10:03Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

YannByron · 2022-01-02T05:45:05Z

@hudi-bot run azure

leesf · 2022-01-02T06:33:04Z

...di-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala

      classOf[ValidateDuplicateKeyPayload].getCanonicalName
    } else {
-      classOf[DefaultHoodieRecordPayload].getCanonicalName
+      classOf[OverwriteWithLatestAvroPayload].getCanonicalName


why need this change?

Same with #4169

leesf · 2022-01-02T06:34:08Z

...hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/TruncateHoodieTableCommand.scala

-    super.run(sparkSession)
+    try {
+      // Delete all data in the table directory
+      super.run(sparkSession)


it will throw exception and cause failure?

According to the process, call the TruncateTableCommand in Spark first and reconstruct the hoodie's table.
Within the TruncateTableCommand in Spark, it will delete the related directories first, and refresh the table which will fail because .hoodie dir has gone and it can't resolve hudi's relation. So I catch this and ignore this.

The related error as following:

Cause: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths. at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:85) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:103) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:348) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:252)

@YannByron looks like it deserves a log statement or comment in the catch block to help explain the try catch scenario?

leesf · 2022-01-02T06:35:05Z

...park-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestShowPartitions.scala

      Seq("year=2021/month=02/day=01")
    )
-    checkAnswer(s"show partitions $tableName partition(day=01)")(
+    checkAnswer(s"show partitions $tableName partition(day='01')")(


here partition must be in string format?

it's related to the origin type of the field. In this case, day is string type.

YannByron · 2022-01-02T06:58:23Z

@hudi-bot run azure

xushiyan

LGTM

xushiyan · 2022-01-02T07:26:30Z

...hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/TruncateHoodieTableCommand.scala

-    super.run(sparkSession)
+    try {
+      // Delete all data in the table directory
+      super.run(sparkSession)


@YannByron looks like it deserves a log statement or comment in the catch block to help explain the try catch scenario?

leesf · 2022-01-02T09:51:54Z

@hudi-bot run azure

hudi-bot · 2022-01-02T09:53:57Z

CI report:

337daae Azure: FAILURE Azure: PENDING

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

…e#4490)

[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2

1914e16

leesf reviewed Jan 2, 2022

View reviewed changes

xushiyan self-assigned this Jan 2, 2022

xushiyan approved these changes Jan 2, 2022

View reviewed changes

[HUDI-3136] add some explanation and refresh table for truncate

337daae

leesf approved these changes Jan 2, 2022

View reviewed changes

xushiyan merged commit 1622b52 into apache:master Jan 2, 2022

melin added a commit to melin/hudi that referenced this pull request Jan 3, 2022

[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (apach…

8785fc0

…e#4490)

vinishjail97 mentioned this pull request Jan 5, 2022

FixNPEIncremental nsivabalan/hudi#9

Closed

5 tasks

vinishjail97 mentioned this pull request Jan 24, 2022

FixIgnoreKey nsivabalan/hudi#11

Closed

5 tasks

liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022

[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (apach…

75181a9

…e#4490)

[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 #4490

[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 #4490

Uh oh!

Conversation

YannByron commented Jan 2, 2022

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

YannByron commented Jan 2, 2022

Uh oh!

leesf Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

YannByron Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

leesf Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

YannByron Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

xushiyan Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YannByron Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

leesf Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

YannByron Jan 2, 2022

Choose a reason for hiding this comment

Uh oh!

YannByron commented Jan 2, 2022

Uh oh!

xushiyan left a comment

Choose a reason for hiding this comment

Uh oh!

xushiyan Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leesf commented Jan 2, 2022

Uh oh!

hudi-bot commented Jan 2, 2022

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xushiyan Jan 2, 2022 •

edited

Loading

xushiyan Jan 2, 2022 •

edited

Loading