-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive #3745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
|
@vinothchandar @umehrot2 hello,can you help to review this? |
|
@xushiyan @YannByron Hi,Can you also help review this PR please? |
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll let @YannByron take another pass.
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
Outdated
Show resolved
Hide resolved
|
@dongkelun please rebase master first of all. |
| hoodieWriteClient: Option[SparkRDDWriteClient[HoodieRecordPayload[Nothing]]] = Option.empty, | ||
| asyncCompactionTriggerFn: Option[Function1[SparkRDDWriteClient[HoodieRecordPayload[Nothing]], Unit]] = Option.empty, | ||
| asyncClusteringTriggerFn: Option[Function1[SparkRDDWriteClient[HoodieRecordPayload[Nothing]], Unit]] = Option.empty | ||
| asyncCompactionTriggerFn: Option[SparkRDDWriteClient[HoodieRecordPayload[Nothing]] => Unit] = Option.empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
| var syncClientToolClassSet = scala.collection.mutable.Set[String]() | ||
| hoodieConfig.getString(META_SYNC_CLIENT_TOOL_CLASS_NAME).split(",").foreach(syncClass => syncClientToolClassSet += syncClass) | ||
| hoodieConfig.getString(META_SYNC_CLIENT_TOOL_CLASS_NAME).split(",").foreach(syncClass => syncClientToolClassSet += syncClass) | ||
| hoodieConfig.setValue(HIVE_TABLE_SERDE_PROPERTIES, getHiveTableSerdeProperties(hoodieConfig)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this code and getHiveTableSerdeProperties is for this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| writeClient.close() | ||
| } | ||
| val metaSyncSuccess = metaSync(sqlContext.sparkSession, hoodieConfig, basePath, df.schema) | ||
| metaSyncSuccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change this. If i'm not wrong, it will execute metaSync when mode == SaveMode.Ignore && tableExists is false according the original codes. But now, it won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably understand what you mean, but I'm not sure whether it's necessary to execute metaSync in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are not sure whether it need to be changed, just leave it alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to sync hive since we don't actually write data.Also, the same logic is used in HoodieSparkSqlWriter.write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has been submitted separately to another PR: 4632
|
@dongkelun |
|
@dongkelun |
OK, thank you for your advice. I'll pay attention to it later |
|
Hello, I have updated the detailed steps and added attachments to JIRA |
|
@YannByron thanks for the review. can you take it from here please? from the reproducing steps, looks like a different bug where primary key and other properties were not respected by HMS? |
@xushiyan ok, i'll take this. |
A new PR has been submitted:4644 |
|
In order to avoid confusion and ambiguity, I have retreated to the historical commitid:071b13b |
|
closing as fixed in #4644 |
hive create database test_hudi;hive
exception:
What is the purpose of the pull request
Add default hiveTableSerdeProperties for Spark SQL when sync Hive
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.