-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE
#36946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DESCRIBE TABLE
|
@cloud-fan @imback82 Please, review this PR. |
| val nameToField = table.schema.map(f => (f.name, f)).toMap | ||
| rows ++= table.partitioning | ||
| .map(_.asInstanceOf[IdentityTransform]) | ||
| .flatMap(_.ref.fieldNames()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more tricky here, as the reference can be a nested field, e.g. a.b. We can still keep the output in a v1 compatible way, but the code to find its type and comment will be a bit more complicated, as we need to use StructType.findNestedField
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for v2 implementation: partitioning by nested columns. Just in case, v1 doesn't support partitioning by nested columns. Also I fixed v2 impl to pass the new test.
| Row("data", "string", null), | ||
| Row("# Partition Information", "", ""), | ||
| Row("# col_name", "data_type", "comment"), | ||
| Row("id", "bigint", null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for curiosity: what's different between v1 and v2 DESC TABLE for this test DESCRIBE TABLE EXTENDED of a partitioned table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v2 (after the PR):
+----------------------------+----------------------------+---------------------------------------------------+
|col_name |data_type |comment |
+----------------------------+----------------------------+---------------------------------------------------+
|id |bigint |null |
|data |string |null |
|# Partition Information | | |
|# col_name |data_type |comment |
|id |bigint |null |
| | | |
|# Metadata Columns | | |
|index |int |Metadata column used to conflict with a data column|
|_partition |string |Partition key used to store the row |
| | | |
|# Detailed Table Information| | |
|Name |test_catalog.ns.table | |
|Comment |this is a test table | |
|Location |file:/tmp/testcat/table_name| |
|Provider |_ | |
|Owner |maximgekk | |
|Table Properties |[bar=baz] | |
+----------------------------+----------------------------+---------------------------------------------------+
v1 in memory:
+----------------------------+----------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------+-------+
|data |string |null |
|id |bigint |null |
|# Partition Information | | |
|# col_name |data_type |comment|
|id |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |ns | |
|Table |table | |
|Created Time |Wed Jun 22 09:37:48 PDT 2022| |
|Last Access |UNKNOWN | |
|Created By |Spark 3.4.0-SNAPSHOT | |
|Type |EXTERNAL | |
|Provider |parquet | |
|Comment |this is a test table | |
|Table Properties |[bar=baz] | |
|Location |file:/tmp/testcat/table_name| |
|Partition Provider |Catalog | |
+----------------------------+----------------------------+-------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 (hive):
+----------------------------+----------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------+-------+
|data |string |null |
|id |bigint |null |
|# Partition Information | | |
|# col_name |data_type |comment|
|id |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |ns | |
|Table |table | |
|Owner |maximgekk | |
|Created Time |Wed Jun 22 09:39:42 PDT 2022 | |
|Last Access |UNKNOWN | |
|Created By |Spark 3.4.0-SNAPSHOT | |
|Type |EXTERNAL | |
|Provider |hive | |
|Comment |this is a test table | |
|Table Properties |[transient_lastDdlTime=1655915982] | |
|Location |file:/tmp/testcat/table_name | |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.TextInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+----------------------------------------------------------+-------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we at least include Table Type in v2 command? It's simply checking if the table has a reserved EXTERNAL table property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| rows += toCatalystRow(s"# ${output(0).name}", output(1).name, output(2).name) | ||
| rows ++= table.partitioning | ||
| .map(_.asInstanceOf[IdentityTransform].ref.fieldNames()) | ||
| .flatMap(table.schema.findNestedField(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the v1 DESC TABLE behavior for malformed tables? e.g. partition column does not exist in the table schema? do we fail or silently ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked that v1 should trigger an assert in the case:
- describePartitionInfo()
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
Line 657 in 0b4739e
describeSchema(table.partitionSchema, buffer, header = true) - table.partitionSchema
- assert(partitionFields.map(_.name) == partitionColumnNames)
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
Line 263 in 8bbbdb5
assert(partitionFields.map(_.name) == partitionColumnNames)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the same here? assert that table.schema.findNestedField does not return None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| .map { fieldNames => | ||
| val nestedField = table.schema.findNestedField(fieldNames) | ||
| assert(nestedField.isDefined, | ||
| s"Not found the partition column ${fieldNames.map(quoteIfNeeded).mkString(".")} " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can reuse MultipartIdentifierHelper.quoted
| nestedField.get | ||
| }.map { case (path, field) => | ||
| toCatalystRow( | ||
| (path :+ field.name).map(quoteIfNeeded(_)).mkString("."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Merging to master. Thank you, @cloud-fan for review. |
What changes were proposed in this pull request?
In the PR, I propose to change output of v2
DESCRIBE TABLE, and make it the same as v1. In particular:Also the PR moves/adds some tests to the base traits:
and addresses review comments in #35265.
Why are the changes needed?
The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users.
Does this PR introduce any user-facing change?
Yes, it changes outputs of v2
DESCRIBE TABLE.How was this patch tested?
By running the
DESCRIBE TABLEtest suites:and related test suites: