Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 21, 2022

What changes were proposed in this pull request?

In the PR, I propose to change output of v2 DESCRIBE TABLE, and make it the same as v1. In particular:

  1. Return NULL instead of empty strings when any comment doesn't exist in the schema info.
  2. When a v2 table has identity transformations (partition by) only, output the partitioning info in v1 style. For instance:
> CREATE TABLE tbl (id bigint, data string) USING _;
> DESCRIBE TABLE tbl;
+-----------------------+---------+-------+
|col_name               |data_type|comment|
+-----------------------+---------+-------+
|id                     |bigint   |null   |
|data                   |string   |null   |
|# Partition Information|         |       |
|# col_name             |data_type|comment|
|id                     |bigint   |null   |
+-----------------------+---------+-------+

Also the PR moves/adds some tests to the base traits:

  • "DESCRIBE TABLE of a non-partitioned table"
  • "DESCRIBE TABLE of a partitioned table"

and addresses review comments in #35265.

Why are the changes needed?

The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users.

Does this PR introduce any user-facing change?

Yes, it changes outputs of v2 DESCRIBE TABLE.

How was this patch tested?

By running the DESCRIBE TABLE test suites:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DescribeTableSuite"

and related test suites:

$ build/sbt "test:testOnly *DataSourceV2SQLSuite"

@github-actions github-actions bot added the SQL label Jun 21, 2022
@MaxGekk MaxGekk changed the title [WIP][SQL] Unify v1 and v2 DESCRIBE TABLE [WIP][SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022
@MaxGekk MaxGekk changed the title [WIP][SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE [SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022
@MaxGekk MaxGekk changed the title [SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE [SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE Jun 22, 2022
@MaxGekk MaxGekk marked this pull request as ready for review June 22, 2022 15:07
@MaxGekk MaxGekk requested a review from cloud-fan June 22, 2022 15:07
@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 22, 2022

@cloud-fan @imback82 Please, review this PR.

val nameToField = table.schema.map(f => (f.name, f)).toMap
rows ++= table.partitioning
.map(_.asInstanceOf[IdentityTransform])
.flatMap(_.ref.fieldNames())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more tricky here, as the reference can be a nested field, e.g. a.b. We can still keep the output in a v1 compatible way, but the code to find its type and comment will be a bit more complicated, as we need to use StructType.findNestedField

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for v2 implementation: partitioning by nested columns. Just in case, v1 doesn't support partitioning by nested columns. Also I fixed v2 impl to pass the new test.

Row("data", "string", null),
Row("# Partition Information", "", ""),
Row("# col_name", "data_type", "comment"),
Row("id", "bigint", null),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for curiosity: what's different between v1 and v2 DESC TABLE for this test DESCRIBE TABLE EXTENDED of a partitioned table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 (after the PR):

+----------------------------+----------------------------+---------------------------------------------------+
|col_name                    |data_type                   |comment                                            |
+----------------------------+----------------------------+---------------------------------------------------+
|id                          |bigint                      |null                                               |
|data                        |string                      |null                                               |
|# Partition Information     |                            |                                                   |
|# col_name                  |data_type                   |comment                                            |
|id                          |bigint                      |null                                               |
|                            |                            |                                                   |
|# Metadata Columns          |                            |                                                   |
|index                       |int                         |Metadata column used to conflict with a data column|
|_partition                  |string                      |Partition key used to store the row                |
|                            |                            |                                                   |
|# Detailed Table Information|                            |                                                   |
|Name                        |test_catalog.ns.table       |                                                   |
|Comment                     |this is a test table        |                                                   |
|Location                    |file:/tmp/testcat/table_name|                                                   |
|Provider                    |_                           |                                                   |
|Owner                       |maximgekk                   |                                                   |
|Table Properties            |[bar=baz]                   |                                                   |
+----------------------------+----------------------------+---------------------------------------------------+

v1 in memory:

+----------------------------+----------------------------+-------+
|col_name                    |data_type                   |comment|
+----------------------------+----------------------------+-------+
|data                        |string                      |null   |
|id                          |bigint                      |null   |
|# Partition Information     |                            |       |
|# col_name                  |data_type                   |comment|
|id                          |bigint                      |null   |
|                            |                            |       |
|# Detailed Table Information|                            |       |
|Database                    |ns                          |       |
|Table                       |table                       |       |
|Created Time                |Wed Jun 22 09:37:48 PDT 2022|       |
|Last Access                 |UNKNOWN                     |       |
|Created By                  |Spark 3.4.0-SNAPSHOT        |       |
|Type                        |EXTERNAL                    |       |
|Provider                    |parquet                     |       |
|Comment                     |this is a test table        |       |
|Table Properties            |[bar=baz]                   |       |
|Location                    |file:/tmp/testcat/table_name|       |
|Partition Provider          |Catalog                     |       |
+----------------------------+----------------------------+-------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 (hive):

+----------------------------+----------------------------------------------------------+-------+
|col_name                    |data_type                                                 |comment|
+----------------------------+----------------------------------------------------------+-------+
|data                        |string                                                    |null   |
|id                          |bigint                                                    |null   |
|# Partition Information     |                                                          |       |
|# col_name                  |data_type                                                 |comment|
|id                          |bigint                                                    |null   |
|                            |                                                          |       |
|# Detailed Table Information|                                                          |       |
|Database                    |ns                                                        |       |
|Table                       |table                                                     |       |
|Owner                       |maximgekk                                                 |       |
|Created Time                |Wed Jun 22 09:39:42 PDT 2022                              |       |
|Last Access                 |UNKNOWN                                                   |       |
|Created By                  |Spark 3.4.0-SNAPSHOT                                      |       |
|Type                        |EXTERNAL                                                  |       |
|Provider                    |hive                                                      |       |
|Comment                     |this is a test table                                      |       |
|Table Properties            |[transient_lastDdlTime=1655915982]                        |       |
|Location                    |file:/tmp/testcat/table_name                              |       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe        |       |
|InputFormat                 |org.apache.hadoop.mapred.TextInputFormat                  |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|       |
|Storage Properties          |[serialization.format=1]                                  |       |
|Partition Provider          |Catalog                                                   |       |
+----------------------------+----------------------------------------------------------+-------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can we at least include Table Type in v2 command? It's simply checking if the table has a reserved EXTERNAL table property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

rows += toCatalystRow(s"# ${output(0).name}", output(1).name, output(2).name)
rows ++= table.partitioning
.map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
.flatMap(table.schema.findNestedField(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the v1 DESC TABLE behavior for malformed tables? e.g. partition column does not exist in the table schema? do we fail or silently ignore it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked that v1 should trigger an assert in the case:

  1. describePartitionInfo()
    describeSchema(table.partitionSchema, buffer, header = true)
  2. table.partitionSchema
  3. assert(partitionFields.map(_.name) == partitionColumnNames)
    assert(partitionFields.map(_.name) == partitionColumnNames)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the same here? assert that table.schema.findNestedField does not return None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.map { fieldNames =>
val nestedField = table.schema.findNestedField(fieldNames)
assert(nestedField.isDefined,
s"Not found the partition column ${fieldNames.map(quoteIfNeeded).mkString(".")} " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can reuse MultipartIdentifierHelper.quoted

nestedField.get
}.map { case (path, field) =>
toCatalystRow(
(path :+ field.name).map(quoteIfNeeded(_)).mkString("."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 23, 2022

Merging to master. Thank you, @cloud-fan for review.

@MaxGekk MaxGekk closed this in b581b14 Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants