Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -642,8 +642,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
if (stats.get.rowCount.isDefined) {
statsProperties += STATISTICS_NUM_ROWS -> stats.get.rowCount.get.toString()
}

// For datasource tables the data schema is stored in the table properties.
val schema = rawTable.properties.get(DATASOURCE_PROVIDER) match {
case Some(provider) => getSchemaFromTableProperties(rawTable)
case _ => rawTable.schema

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Hive serde tables that were created by Spark 2.1 or later, we should still restore it from table properties.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call restoreTableMetadata to avoid duplicate logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call getTable().schema or you guys think its too verbose ?
val schema = getTable(db, table).schema ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Actually, we do have a raw table here.. so i will just call restoreTableMetadata. Thanks a lot @gatorsmile and @viirya

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need rawTable. Call getTable will incur another metastore access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya right. I agree. I was saying that we do have a raw table from a prior call. So here we just pass that to restoreTableMetadata like you suggested.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw your comment #18804 (comment) after post #18804 (comment). :)

}

val colNameTypeMap: Map[String, DataType] =
rawTable.schema.fields.map(f => (f.name, f.dataType)).toMap
schema.fields.map(f => (f.name, f.dataType)).toMap
stats.get.colStats.foreach { case (colName, colStat) =>
colStat.toMap(colName, colNameTypeMap(colName)).foreach { case (k, v) =>
statsProperties += (columnStatKeyPropName(colName, k) -> v)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,26 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}

test("analyze non hive compatible datasource tables") {
val table = "parquet_tab"
withTable(table) {
sql(
s"""
|CREATE TABLE $table (a int, b int)
|USING parquet
|OPTIONS (skipHiveMetadata true)
""".stripMargin)
sql(s"insert into $table values (1, 1)")
sql(s"insert into $table values (2, 1)")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: minor style issue. INSERT INTO...

sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS a, b")
val fetchedStats0 =
checkTableStats(table, hasSizeInBytes = true, expectedRowCounts = Some(2))
assert(fetchedStats0.get.colStats == Map(
"a" -> ColumnStat(2, Some(1), Some(2), 0, 4, 4),
"b" -> ColumnStat(1, Some(1), Some(1), 0, 4, 4)))
}
}

test("SPARK-21079 - analyze table with location different than that of individual partitions") {
val tableName = "analyzeTable_part"
withTable(tableName) {
Expand Down