Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ import scala.util.control.NonFatal
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.hive.ql.metadata.HiveException
import org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT
import org.apache.thrift.TException

import org.apache.spark.{SparkConf, SparkException}
Expand Down Expand Up @@ -806,6 +807,8 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
updateLocationInStorageProps(table, newPath = None).copy(
locationUri = tableLocation.map(CatalogUtils.stringToURI(_)))
}
val storageWithoutHiveGeneratedProperties = storageWithLocation.copy(
properties = storageWithLocation.properties.filterKeys(!HIVE_GENERATED_STORAGE_PROPERTIES(_)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we do this in HiveClientImpl? IIRC we filter out some table props there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HiveClientImpl, we don't know the table is Hive table or DataSource table yet. Can we remove the props even for Hive tables?

val partitionProvider = table.properties.get(TABLE_PARTITION_PROVIDER)

val schemaFromTableProps = getSchemaFromTableProperties(table)
Expand All @@ -814,7 +817,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat

table.copy(
provider = Some(provider),
storage = storageWithLocation,
storage = storageWithoutHiveGeneratedProperties,
schema = reorderedSchema,
partitionColumnNames = partColumnNames,
bucketSpec = getBucketSpecFromTableProperties(table),
Expand Down Expand Up @@ -1309,6 +1312,8 @@ object HiveExternalCatalog {

val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"

val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin . The title means Hive-generated storage properties, but this PR excludes only this one. Could you add more? Othewise, can we make this as a SQLConf in order to be configurable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the hive-generated storage property I think we should exclude for now is only this one, but we might have some more in the future, so I'd say "properties" and we will add them to this set in the case. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more in the future. Basically, these properties are useless to Spark data source tables.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, @ueshin and @gatorsmile .


// When storing data source tables in hive metastore, we need to set data schema to empty if the
// schema is hive-incompatible. However we need a hack to preserve existing behavior. Before
// Spark 2.0, we do not set a default serde here (this was done in Hive), and so if the user
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA
outputFormat = serde.get.outputFormat,
serde = serde.get.serde,
compressed = false,
properties = Map("serialization.format" -> "1"))
properties = Map.empty)
} else {
CatalogStorageFormat(
locationUri = Some(catalog.defaultTablePath(name)),
Expand Down