Skip to content

Commit 4ecc648

Browse files
dongjoon-hyunhvanhovell
authored andcommitted
[SPARK-17612][SQL] Support DESCRIBE table PARTITION SQL syntax
## What changes were proposed in this pull request? This PR implements `DESCRIBE table PARTITION` SQL Syntax again. It was supported until Spark 1.6.2, but was dropped since 2.0.0. **Spark 1.6.2** ```scala scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)") res1: org.apache.spark.sql.DataFrame = [result: string] scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)") res2: org.apache.spark.sql.DataFrame = [result: string] scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false) +----------------------------------------------------------------+ |result | +----------------------------------------------------------------+ |a string | |b int | |c string | |d string | | | |# Partition Information | |# col_name data_type comment | | | |c string | |d string | +----------------------------------------------------------------+ ``` **Spark 2.0** - **Before** ```scala scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)") res1: org.apache.spark.sql.DataFrame = [] scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false) org.apache.spark.sql.catalyst.parser.ParseException: Unsupported SQL statement ``` - **After** ```scala scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)") res1: org.apache.spark.sql.DataFrame = [] scala> sql("DESC partitioned_table PARTITION (c='Us', d=1)").show(false) +-----------------------+---------+-------+ |col_name |data_type|comment| +-----------------------+---------+-------+ |a |string |null | |b |int |null | |c |string |null | |d |string |null | |# Partition Information| | | |# col_name |data_type|comment| |c |string |null | |d |string |null | +-----------------------+---------+-------+ scala> sql("DESC EXTENDED partitioned_table PARTITION (c='Us', d=1)").show(100,false) +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+ |col_name |data_type|comment| +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+ |a |string |null | |b |int |null | |c |string |null | |d |string |null | |# Partition Information | | | |# col_name |data_type|comment| |c |string |null | |d |string |null | | | | | |Detailed Partition Information CatalogPartition( Partition Values: [Us, 1] Storage(Location: file:/Users/dhyun/SPARK-17612-DESC-PARTITION/spark-warehouse/partitioned_table/c=Us/d=1, InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, Serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Properties: [serialization.format=1]) Partition Parameters:{transient_lastDdlTime=1475001066})| | | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+ scala> sql("DESC FORMATTED partitioned_table PARTITION (c='Us', d=1)").show(100,false) +--------------------------------+---------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +--------------------------------+---------------------------------------------------------------------------------------+-------+ |a |string |null | |b |int |null | |c |string |null | |d |string |null | |# Partition Information | | | |# col_name |data_type |comment| |c |string |null | |d |string |null | | | | | |# Detailed Partition Information| | | |Partition Value: |[Us, 1] | | |Database: |default | | |Table: |partitioned_table | | |Location: |file:/Users/dhyun/SPARK-17612-DESC-PARTITION/spark-warehouse/partitioned_table/c=Us/d=1| | |Partition Parameters: | | | | transient_lastDdlTime |1475001066 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.TextInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | |Compressed: |No | | |Storage Desc Parameters: | | | | serialization.format |1 | | +--------------------------------+---------------------------------------------------------------------------------------+-------+ ``` ## How was this patch tested? Pass the Jenkins tests with a new testcase. Author: Dongjoon Hyun <[email protected]> Closes #15168 from dongjoon-hyun/SPARK-17612.
1 parent 566d7f2 commit 4ecc648

File tree

6 files changed

+287
-18
lines changed

6 files changed

+287
-18
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,18 @@ object CatalogStorageFormat {
8686
case class CatalogTablePartition(
8787
spec: CatalogTypes.TablePartitionSpec,
8888
storage: CatalogStorageFormat,
89-
parameters: Map[String, String] = Map.empty)
89+
parameters: Map[String, String] = Map.empty) {
90+
91+
override def toString: String = {
92+
val output =
93+
Seq(
94+
s"Partition Values: [${spec.values.mkString(", ")}]",
95+
s"$storage",
96+
s"Partition Parameters:{${parameters.map(p => p._1 + "=" + p._2).mkString(", ")}}")
97+
98+
output.filter(_.nonEmpty).mkString("CatalogPartition(\n\t", "\n\t", ")")
99+
}
100+
}
90101

91102

92103
/**

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,13 +276,24 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
276276
* Create a [[DescribeTableCommand]] logical plan.
277277
*/
278278
override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) {
279-
// Describe partition and column are not supported yet. Return null and let the parser decide
279+
// Describe column are not supported yet. Return null and let the parser decide
280280
// what to do with this (create an exception or pass it on to a different system).
281-
if (ctx.describeColName != null || ctx.partitionSpec != null) {
281+
if (ctx.describeColName != null) {
282282
null
283283
} else {
284+
val partitionSpec = if (ctx.partitionSpec != null) {
285+
// According to the syntax, visitPartitionSpec returns `Map[String, Option[String]]`.
286+
visitPartitionSpec(ctx.partitionSpec).map {
287+
case (key, Some(value)) => key -> value
288+
case (key, _) =>
289+
throw new ParseException(s"PARTITION specification is incomplete: `$key`", ctx)
290+
}
291+
} else {
292+
Map.empty[String, String]
293+
}
284294
DescribeTableCommand(
285295
visitTableIdentifier(ctx.tableIdentifier),
296+
partitionSpec,
286297
ctx.EXTENDED != null,
287298
ctx.FORMATTED != null)
288299
}

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

Lines changed: 69 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ import org.apache.hadoop.fs.Path
2929

3030
import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
3131
import org.apache.spark.sql.catalyst.TableIdentifier
32-
import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogStorageFormat, CatalogTable, CatalogTableType}
32+
import org.apache.spark.sql.catalyst.catalog._
3333
import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
3434
import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
3535
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}
@@ -390,10 +390,14 @@ case class TruncateTableCommand(
390390
/**
391391
* Command that looks like
392392
* {{{
393-
* DESCRIBE [EXTENDED|FORMATTED] table_name;
393+
* DESCRIBE [EXTENDED|FORMATTED] table_name partitionSpec?;
394394
* }}}
395395
*/
396-
case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isFormatted: Boolean)
396+
case class DescribeTableCommand(
397+
table: TableIdentifier,
398+
partitionSpec: TablePartitionSpec,
399+
isExtended: Boolean,
400+
isFormatted: Boolean)
397401
extends RunnableCommand {
398402

399403
override val output: Seq[Attribute] = Seq(
@@ -411,17 +415,25 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
411415
val catalog = sparkSession.sessionState.catalog
412416

413417
if (catalog.isTemporaryTable(table)) {
418+
if (partitionSpec.nonEmpty) {
419+
throw new AnalysisException(
420+
s"DESC PARTITION is not allowed on a temporary view: ${table.identifier}")
421+
}
414422
describeSchema(catalog.lookupRelation(table).schema, result)
415423
} else {
416424
val metadata = catalog.getTableMetadata(table)
417425
describeSchema(metadata.schema, result)
418426

419-
if (isExtended) {
420-
describeExtended(metadata, result)
421-
} else if (isFormatted) {
422-
describeFormatted(metadata, result)
427+
describePartitionInfo(metadata, result)
428+
429+
if (partitionSpec.isEmpty) {
430+
if (isExtended) {
431+
describeExtendedTableInfo(metadata, result)
432+
} else if (isFormatted) {
433+
describeFormattedTableInfo(metadata, result)
434+
}
423435
} else {
424-
describePartitionInfo(metadata, result)
436+
describeDetailedPartitionInfo(catalog, metadata, result)
425437
}
426438
}
427439

@@ -436,16 +448,12 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
436448
}
437449
}
438450

439-
private def describeExtended(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
440-
describePartitionInfo(table, buffer)
441-
451+
private def describeExtendedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
442452
append(buffer, "", "", "")
443453
append(buffer, "# Detailed Table Information", table.toString, "")
444454
}
445455

446-
private def describeFormatted(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
447-
describePartitionInfo(table, buffer)
448-
456+
private def describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
449457
append(buffer, "", "", "")
450458
append(buffer, "# Detailed Table Information", "", "")
451459
append(buffer, "Database:", table.database, "")
@@ -499,6 +507,53 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
499507
}
500508
}
501509

510+
private def describeDetailedPartitionInfo(
511+
catalog: SessionCatalog,
512+
metadata: CatalogTable,
513+
result: ArrayBuffer[Row]): Unit = {
514+
if (metadata.tableType == CatalogTableType.VIEW) {
515+
throw new AnalysisException(
516+
s"DESC PARTITION is not allowed on a view: ${table.identifier}")
517+
}
518+
if (DDLUtils.isDatasourceTable(metadata)) {
519+
throw new AnalysisException(
520+
s"DESC PARTITION is not allowed on a datasource table: ${table.identifier}")
521+
}
522+
val partition = catalog.getPartition(table, partitionSpec)
523+
if (isExtended) {
524+
describeExtendedDetailedPartitionInfo(table, metadata, partition, result)
525+
} else if (isFormatted) {
526+
describeFormattedDetailedPartitionInfo(table, metadata, partition, result)
527+
describeStorageInfo(metadata, result)
528+
}
529+
}
530+
531+
private def describeExtendedDetailedPartitionInfo(
532+
tableIdentifier: TableIdentifier,
533+
table: CatalogTable,
534+
partition: CatalogTablePartition,
535+
buffer: ArrayBuffer[Row]): Unit = {
536+
append(buffer, "", "", "")
537+
append(buffer, "Detailed Partition Information " + partition.toString, "", "")
538+
}
539+
540+
private def describeFormattedDetailedPartitionInfo(
541+
tableIdentifier: TableIdentifier,
542+
table: CatalogTable,
543+
partition: CatalogTablePartition,
544+
buffer: ArrayBuffer[Row]): Unit = {
545+
append(buffer, "", "", "")
546+
append(buffer, "# Detailed Partition Information", "", "")
547+
append(buffer, "Partition Value:", s"[${partition.spec.values.mkString(", ")}]", "")
548+
append(buffer, "Database:", table.database, "")
549+
append(buffer, "Table:", tableIdentifier.table, "")
550+
append(buffer, "Location:", partition.storage.locationUri.getOrElse(""), "")
551+
append(buffer, "Partition Parameters:", "", "")
552+
partition.parameters.foreach { case (key, value) =>
553+
append(buffer, s" $key", value, "")
554+
}
555+
}
556+
502557
private def describeSchema(schema: StructType, buffer: ArrayBuffer[Row]): Unit = {
503558
schema.foreach { column =>
504559
append(buffer, column.name, column.dataType.simpleString, column.getComment().orNull)
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
CREATE TABLE t (a STRING, b INT) PARTITIONED BY (c STRING, d STRING);
2+
3+
ALTER TABLE t ADD PARTITION (c='Us', d=1);
4+
5+
DESC t;
6+
7+
-- Ignore these because there exist timestamp results, e.g., `Create Table`.
8+
-- DESC EXTENDED t;
9+
-- DESC FORMATTED t;
10+
11+
DESC t PARTITION (c='Us', d=1);
12+
13+
-- Ignore these because there exist timestamp results, e.g., transient_lastDdlTime.
14+
-- DESC EXTENDED t PARTITION (c='Us', d=1);
15+
-- DESC FORMATTED t PARTITION (c='Us', d=1);
16+
17+
-- NoSuchPartitionException: Partition not found in table
18+
DESC t PARTITION (c='Us', d=2);
19+
20+
-- AnalysisException: Partition spec is invalid
21+
DESC t PARTITION (c='Us');
22+
23+
-- ParseException: PARTITION specification is incomplete
24+
DESC t PARTITION (c='Us', d);
25+
26+
-- DROP TEST TABLE
27+
DROP TABLE t;
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
-- Automatically generated by SQLQueryTestSuite
2+
-- Number of queries: 8
3+
4+
5+
-- !query 0
6+
CREATE TABLE t (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)
7+
-- !query 0 schema
8+
struct<>
9+
-- !query 0 output
10+
11+
12+
13+
-- !query 1
14+
ALTER TABLE t ADD PARTITION (c='Us', d=1)
15+
-- !query 1 schema
16+
struct<>
17+
-- !query 1 output
18+
19+
20+
21+
-- !query 2
22+
DESC t
23+
-- !query 2 schema
24+
struct<col_name:string,data_type:string,comment:string>
25+
-- !query 2 output
26+
# Partition Information
27+
# col_name data_type comment
28+
a string
29+
b int
30+
c string
31+
c string
32+
d string
33+
d string
34+
35+
36+
-- !query 3
37+
DESC t PARTITION (c='Us', d=1)
38+
-- !query 3 schema
39+
struct<col_name:string,data_type:string,comment:string>
40+
-- !query 3 output
41+
# Partition Information
42+
# col_name data_type comment
43+
a string
44+
b int
45+
c string
46+
c string
47+
d string
48+
d string
49+
50+
51+
-- !query 4
52+
DESC t PARTITION (c='Us', d=2)
53+
-- !query 4 schema
54+
struct<>
55+
-- !query 4 output
56+
org.apache.spark.sql.catalyst.analysis.NoSuchPartitionException
57+
Partition not found in table 't' database 'default':
58+
c -> Us
59+
d -> 2;
60+
61+
62+
-- !query 5
63+
DESC t PARTITION (c='Us')
64+
-- !query 5 schema
65+
struct<>
66+
-- !query 5 output
67+
org.apache.spark.sql.AnalysisException
68+
Partition spec is invalid. The spec (c) must match the partition spec (c, d) defined in table '`default`.`t`';
69+
70+
71+
-- !query 6
72+
DESC t PARTITION (c='Us', d)
73+
-- !query 6 schema
74+
struct<>
75+
-- !query 6 output
76+
org.apache.spark.sql.catalyst.parser.ParseException
77+
78+
PARTITION specification is incomplete: `d`(line 1, pos 0)
79+
80+
== SQL ==
81+
DESC t PARTITION (c='Us', d)
82+
^^^
83+
84+
85+
-- !query 7
86+
DROP TABLE t
87+
-- !query 7 schema
88+
struct<>
89+
-- !query 7 output
90+

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ import org.apache.hadoop.fs.Path
2626

2727
import org.apache.spark.sql._
2828
import org.apache.spark.sql.catalyst.TableIdentifier
29-
import org.apache.spark.sql.catalyst.analysis.{EliminateSubqueryAliases, FunctionRegistry}
29+
import org.apache.spark.sql.catalyst.analysis.{EliminateSubqueryAliases, FunctionRegistry, NoSuchPartitionException}
3030
import org.apache.spark.sql.catalyst.catalog.CatalogTableType
3131
import org.apache.spark.sql.catalyst.parser.ParseException
3232
import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation}
@@ -341,6 +341,81 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
341341
}
342342
}
343343

344+
test("describe partition") {
345+
withTable("partitioned_table") {
346+
sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)")
347+
sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
348+
349+
checkKeywordsExist(sql("DESC partitioned_table PARTITION (c='Us', d=1)"),
350+
"# Partition Information",
351+
"# col_name")
352+
353+
checkKeywordsExist(sql("DESC EXTENDED partitioned_table PARTITION (c='Us', d=1)"),
354+
"# Partition Information",
355+
"# col_name",
356+
"Detailed Partition Information CatalogPartition(",
357+
"Partition Values: [Us, 1]",
358+
"Storage(Location:",
359+
"Partition Parameters")
360+
361+
checkKeywordsExist(sql("DESC FORMATTED partitioned_table PARTITION (c='Us', d=1)"),
362+
"# Partition Information",
363+
"# col_name",
364+
"# Detailed Partition Information",
365+
"Partition Value:",
366+
"Database:",
367+
"Table:",
368+
"Location:",
369+
"Partition Parameters:",
370+
"# Storage Information")
371+
}
372+
}
373+
374+
test("describe partition - error handling") {
375+
withTable("partitioned_table", "datasource_table") {
376+
sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY (c STRING, d STRING)")
377+
sql("ALTER TABLE partitioned_table ADD PARTITION (c='Us', d=1)")
378+
379+
val m = intercept[NoSuchPartitionException] {
380+
sql("DESC partitioned_table PARTITION (c='Us', d=2)")
381+
}.getMessage()
382+
assert(m.contains("Partition not found in table"))
383+
384+
val m2 = intercept[AnalysisException] {
385+
sql("DESC partitioned_table PARTITION (c='Us')")
386+
}.getMessage()
387+
assert(m2.contains("Partition spec is invalid"))
388+
389+
val m3 = intercept[ParseException] {
390+
sql("DESC partitioned_table PARTITION (c='Us', d)")
391+
}.getMessage()
392+
assert(m3.contains("PARTITION specification is incomplete: `d`"))
393+
394+
spark
395+
.range(1).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd).write
396+
.partitionBy("d")
397+
.saveAsTable("datasource_table")
398+
val m4 = intercept[AnalysisException] {
399+
sql("DESC datasource_table PARTITION (d=2)")
400+
}.getMessage()
401+
assert(m4.contains("DESC PARTITION is not allowed on a datasource table"))
402+
403+
val m5 = intercept[AnalysisException] {
404+
spark.range(10).select('id as 'a, 'id as 'b).createTempView("view1")
405+
sql("DESC view1 PARTITION (c='Us', d=1)")
406+
}.getMessage()
407+
assert(m5.contains("DESC PARTITION is not allowed on a temporary view"))
408+
409+
withView("permanent_view") {
410+
val m = intercept[AnalysisException] {
411+
sql("CREATE VIEW permanent_view AS SELECT * FROM partitioned_table")
412+
sql("DESC permanent_view PARTITION (c='Us', d=1)")
413+
}.getMessage()
414+
assert(m.contains("DESC PARTITION is not allowed on a view"))
415+
}
416+
}
417+
}
418+
344419
test("SPARK-5371: union with null and sum") {
345420
val df = Seq((1, 1)).toDF("c1", "c2")
346421
df.createOrReplaceTempView("table1")

0 commit comments

Comments
 (0)