[SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE` #36946

MaxGekk · 2022-06-21T16:12:55Z

What changes were proposed in this pull request?

In the PR, I propose to change output of v2 DESCRIBE TABLE, and make it the same as v1. In particular:

Return NULL instead of empty strings when any comment doesn't exist in the schema info.
When a v2 table has identity transformations (partition by) only, output the partitioning info in v1 style. For instance:

> CREATE TABLE tbl (id bigint, data string) USING _;
> DESCRIBE TABLE tbl;
+-----------------------+---------+-------+
|col_name               |data_type|comment|
+-----------------------+---------+-------+
|id                     |bigint   |null   |
|data                   |string   |null   |
|# Partition Information|         |       |
|# col_name             |data_type|comment|
|id                     |bigint   |null   |
+-----------------------+---------+-------+

Also the PR moves/adds some tests to the base traits:

"DESCRIBE TABLE of a non-partitioned table"
"DESCRIBE TABLE of a partitioned table"

and addresses review comments in #35265.

Why are the changes needed?

The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users.

Does this PR introduce any user-facing change?

Yes, it changes outputs of v2 DESCRIBE TABLE.

How was this patch tested?

By running the DESCRIBE TABLE test suites:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DescribeTableSuite"

and related test suites:

$ build/sbt "test:testOnly *DataSourceV2SQLSuite"

MaxGekk · 2022-06-22T15:08:04Z

@cloud-fan @imback82 Please, review this PR.

cloud-fan · 2022-06-22T16:13:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala

+        val nameToField = table.schema.map(f => (f.name, f)).toMap
+        rows ++= table.partitioning
+          .map(_.asInstanceOf[IdentityTransform])
+          .flatMap(_.ref.fieldNames())


I think it's more tricky here, as the reference can be a nested field, e.g. a.b. We can still keep the output in a v1 compatible way, but the code to find its type and comment will be a bit more complicated, as we need to use StructType.findNestedField

I added a test for v2 implementation: partitioning by nested columns. Just in case, v1 doesn't support partitioning by nested columns. Also I fixed v2 impl to pass the new test.

cloud-fan · 2022-06-22T16:15:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+          Row("data", "string", null),
+          Row("# Partition Information", "", ""),
+          Row("# col_name", "data_type", "comment"),
+          Row("id", "bigint", null),


just for curiosity: what's different between v1 and v2 DESC TABLE for this test DESCRIBE TABLE EXTENDED of a partitioned table?

I see, can we at least include Table Type in v2 command? It's simply checking if the table has a reserved EXTERNAL table property.

cloud-fan · 2022-06-23T03:24:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala

+        rows += toCatalystRow(s"# ${output(0).name}", output(1).name, output(2).name)
+        rows ++= table.partitioning
+          .map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
+          .flatMap(table.schema.findNestedField(_))


what's the v1 DESC TABLE behavior for malformed tables? e.g. partition column does not exist in the table schema? do we fail or silently ignore it?

I have checked that v1 should trigger an assert in the case:

describePartitionInfo()

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

Line 657 in 0b4739e

describeSchema(table.partitionSchema, buffer, header = true)

table.partitionSchema

assert(partitionFields.map(_.name) == partitionColumnNames)

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

Line 263 in 8bbbdb5

assert(partitionFields.map(_.name) == partitionColumnNames)

Can we do the same here? assert that table.schema.findNestedField does not return None.

cloud-fan · 2022-06-23T06:27:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala

+          .map { fieldNames =>
+            val nestedField = table.schema.findNestedField(fieldNames)
+            assert(nestedField.isDefined,
+              s"Not found the partition column ${fieldNames.map(quoteIfNeeded).mkString(".")} " +


nit: we can reuse MultipartIdentifierHelper.quoted

cloud-fan · 2022-06-23T06:27:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala

+            nestedField.get
+          }.map { case (path, field) =>
+            toCatalystRow(
+              (path :+ field.name).map(quoteIfNeeded(_)).mkString("."),


MaxGekk · 2022-06-23T08:24:10Z

Merging to master. Thank you, @cloud-fan for review.

Unify v1 and v2 DESCRIBE TABLE

ecec69b

github-actions bot added the SQL label Jun 21, 2022

MaxGekk added 5 commits June 21, 2022 21:23

Fix test title

4ffa242

Fix test failures of v2 DESCRIBE TABLE

c17b346

Introduce getProvider()

2224caa

Add the test "DESCRIBE TABLE of a partitioned table"

227b99f

Support describing of a partitioned table

699e72e

MaxGekk changed the title ~~[WIP][SQL] Unify v1 and v2 DESCRIBE TABLE~~ [WIP][SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022

MaxGekk changed the title ~~[WIP][SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE~~ [SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022

MaxGekk changed the title ~~[SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE~~ [SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022

MaxGekk marked this pull request as ready for review June 22, 2022 15:07

MaxGekk requested a review from cloud-fan June 22, 2022 15:07

cloud-fan reviewed Jun 22, 2022

View reviewed changes

MaxGekk added 2 commits June 22, 2022 20:48

Fix v2 impl for partitioning by nested columns

c75e090

Output Type for v2 tables

39ed298

cloud-fan reviewed Jun 23, 2022

View reviewed changes

Print assert when a partition column doesn't exist

d95ff9a

cloud-fan reviewed Jun 23, 2022

View reviewed changes

cloud-fan approved these changes Jun 23, 2022

View reviewed changes

MaxGekk closed this in b581b14 Jun 23, 2022

[SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE #36946

[SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE #36946

Uh oh!

Conversation

MaxGekk commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MaxGekk commented Jun 22, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE` #36946

[SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE` #36946

MaxGekk commented Jun 21, 2022 •

edited

Loading