-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema #32675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema #32675
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1092,14 +1092,23 @@ private[hive] object HiveClientImpl extends Logging { | |
| hiveTable.setViewExpandedText(t) | ||
| } | ||
|
|
||
| // hive may convert schema into lower cases while bucketSpec will not | ||
| // only convert if case not match | ||
| def restoreHiveBucketSpecColNames(schema: StructType, names: Seq[String]): Seq[String] = { | ||
| names.map { name => | ||
| schema.find(col => SQLConf.get.resolver(col.name, name)).map(_.name).getOrElse(name) | ||
| } | ||
| } | ||
|
|
||
| table.bucketSpec match { | ||
| case Some(bucketSpec) if !HiveExternalCatalog.isDatasourceTable(table) => | ||
| hiveTable.setNumBuckets(bucketSpec.numBuckets) | ||
| hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava) | ||
| hiveTable.setBucketCols( | ||
| restoreHiveBucketSpecColNames(table.schema, bucketSpec.bucketColumnNames).toList.asJava) | ||
|
|
||
| if (bucketSpec.sortColumnNames.nonEmpty) { | ||
| hiveTable.setSortCols( | ||
| bucketSpec.sortColumnNames | ||
| restoreHiveBucketSpecColNames(table.schema, bucketSpec.sortColumnNames) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I still can't understand how the bug happens. In this
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. org/apache/spark/sql/hive/client/HiveClient.scala in this method, we get table from metasotre and pass it into getPartitions
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So there is a unnecessary hive table ->
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this convertion is not unnecessary since the hive client interface require a CatalogTable.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we change the hive client interface? For example
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not the only place that have this issue. if we set spark.sql.statistics.size.autoUpdate.enabled=true, you can see this issue as well. for alter table, we have to do a catalogTable->HiveTable conversion
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. catalogTable->HiveTable is fine, as long as the catalogTable is correctly initialized. The problem I see here is, we get catalogTable by
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AngersZhuuuu can you take this over?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure. |
||
| .map(col => new Order(col, HIVE_COLUMN_ORDER_ASC)) | ||
| .toList | ||
| .asJava | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify:
hiveTable.setFieldslower-cases the column names, buthiveTable.setBucketColsdoes not. And this causes the exception?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes