Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -791,11 +791,22 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman
}
}

// These table properties should not be included in the output statement of SHOW CREATE TABLE
val excludedTableProperties = Set(
// The following are hive-generated statistics fields
"COLUMN_STATS_ACCURATE",
"numFiles",
"numPartitions",
"numRows",
"rawDataSize",
"totalSize"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we set each of these property names as a constant so that we can use them in the translation layer?

Copy link
Member Author

@gatorsmile gatorsmile Aug 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for fixing a bug. We might need to backport it to 2.0. When we implementing the translation layer, we can do that, just like what we did for the property names of the Data Source Table schema


private def showHiveTableProperties(metadata: CatalogTable, builder: StringBuilder): Unit = {
if (metadata.properties.nonEmpty) {
val filteredProps = metadata.properties.filterNot {
// Skips "EXTERNAL" property for external tables
case (key, _) => key == "EXTERNAL" && metadata.tableType == EXTERNAL
// Skips all the stats info (See the JIRA: HIVE-13792)
Copy link
Contributor

@hvanhovell hvanhovell Aug 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we fix this in the HiveExternalCatalog and just drop those properties there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether I got your point. Are you saying getTableOption in HiveExternalCatalog?

Initially, I had the same design like you. Later, I realized we still use/display them in the other DDL statements, for example, DESCRIBE EXTENDED TABLE.

Let me know if my understanding is wrong. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean in general: we should not leak Hive specific implementation (i.e. hacks) into sql/core.

At first I thought that we might be able to hide/filter them but you have a point. The good news is that this becomes a non-issue when we have statistics as a part of the CatalogTable. Then these properties can become the problem of a yet-to-be written translation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree on this general rule.

When we support translations, we need to be very careful about including these statistics info in the SHOW CREATE TABLE DDL. Hive does not include them in SHOW CREATE TABLE, as shown in their JIRA: https://issues.apache.org/jira/browse/HIVE-13792. If we allow users to provide the statistics info when creating the tables, we might need to mark them as inaccurate, like what Hive does now?

BTW, should we merge this in 2.1 before we support the translation? So far, Spark 2.0 has the bug. Let me know what I should do next.

Thanks!

case (key, _) => excludedTableProperties.contains(key)
}

val props = filteredProps.map { case (key, value) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,9 @@ private[hive] class HiveClientImpl(
properties = Option(h.getTTable.getSd.getSerdeInfo.getParameters)
.map(_.asScala.toMap).orNull
),
properties = properties.filter(kv => kv._1 != "comment"),
// For EXTERNAL_TABLE, the table properties has a particular field "EXTERNAL". This is added
// in the function toHiveTable.
properties = properties.filter(kv => kv._1 != "comment" && kv._1 != "EXTERNAL"),
comment = properties.get("comment"),
viewOriginalText = Option(h.getViewOriginalText),
viewText = Option(h.getViewExpandedText),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -277,12 +277,24 @@ class ShowCreateTableSuite extends QueryTest with SQLTestUtils with TestHiveSing
checkCreateTableOrView(TableIdentifier(table, Some("default")), "VIEW")
}

// These table properties should not be included in the output statement of SHOW CREATE TABLE
val excludedTableProperties = Set(
// The following are hive-generated statistics fields
"COLUMN_STATS_ACCURATE",
"numFiles",
"numPartitions",
"numRows",
"rawDataSize",
"totalSize"
)

private def checkCreateTableOrView(table: TableIdentifier, checkType: String): Unit = {
val db = table.database.getOrElse("default")
val expected = spark.sharedState.externalCatalog.getTable(db, table.table)
val shownDDL = sql(s"SHOW CREATE TABLE ${table.quotedString}").head().getString(0)
sql(s"DROP $checkType ${table.quotedString}")

checkExcludedTableProperties(shownDDL)
try {
sql(shownDDL)
val actual = spark.sharedState.externalCatalog.getTable(db, table.table)
Expand All @@ -292,6 +304,10 @@ class ShowCreateTableSuite extends QueryTest with SQLTestUtils with TestHiveSing
}
}

private def checkExcludedTableProperties(shownDDL: String): Unit = {
excludedTableProperties.foreach(p => assert(!shownDDL.contains(p)))
}

private def checkCatalogTables(expected: CatalogTable, actual: CatalogTable): Unit = {
def normalize(table: CatalogTable): CatalogTable = {
val nondeterministicProps = Set(
Expand All @@ -302,18 +318,11 @@ class ShowCreateTableSuite extends QueryTest with SQLTestUtils with TestHiveSing
"last_modified_by",
"last_modified_time",
"Owner:",
"COLUMN_STATS_ACCURATE",
// The following are hive specific schema parameters which we do not need to match exactly.
"numFiles",
"numRows",
"rawDataSize",
"totalSize",
"totalNumberFiles",
"maxFileSize",
"minFileSize",
// EXTERNAL is not non-deterministic, but it is filtered out for external tables.
"EXTERNAL"
)
"minFileSize"
) ++ excludedTableProperties

table.copy(
createTime = 0L,
Expand Down