[WIP][SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests #35265

imback82 · 2022-01-21T04:13:59Z

What changes were proposed in this pull request?

Move DESCRIBE TABLE parsing tests to DescribeRelationParserSuite.
Put common DESCRIBE TABLE tests into one trait org.apache.spark.sql.execution.command.DescribeTableSuiteBase, and put datasource specific tests to the v1.DescribeTableSuite and v2.DescribeTableSuite.

The changes follow the approach of #30287.

Why are the changes needed?

The unification will allow to run common DESCRIBE TABLE tests for both DSv1/Hive DSv1 and DSv2
We can detect missing features and differences between DSv1 and DSv2 implementations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests and new tests.

imback82 · 2022-01-21T04:18:13Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+        Row("data", "string", ""),
+        Row("id", "bigint", ""),
+        Row("", "", ""),
+        Row("# Partitioning", "", ""),


@cloud-fan this is a WIP PR, but as you can see there are quite a bit of differences between v1 and v2 command. Do you want me to unify the behavior in this PR? If so, should we update v1 to match v2 or the other way around? TIA!

yea I know it's going to be hard.

The most difficult part is partitioning. V2 supports transform instead of partition column names. I'd suggest specially handling tables with only partition columns, to not surprise the majority of users.

a string b int # Partition Columns b int

Once there exists transforms, we use the v2 format

a string b int c timestamp # Partitioning identity(b) year(c)

For Detailed Table Information, we need to add more information in the v2 command to follow v1, such as Created Time. We can match V1Table directly first, and then add more built-in v2 table properties later.

@Peng-Lei is EXTERAL a reserved keyword now?

The most difficult part is partitioning. V2 supports transform instead of partition column names. I'd suggest specially handling tables with only partition columns, to not surprise the majority of users.

Can I assume that only the IdentityTransforms are the partition columns?

Yes, only IdentityTransforms represents partition columns

cloud-fan · 2022-02-09T12:42:51Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

    case DescribeRelation(
-         ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended, output) =>
+         ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended, output)
+        if conf.useV1Command =>


I think we should still use v1 command for views.

cloud-fan · 2022-02-09T12:43:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

      append(buffer, s"# ${output.head.name}", output(1).name, output(2).name)
    }
-    schema.foreach { column =>
+    schema.sortBy(_.name).foreach { column =>


hmm, this is weird. The column order matters and we should retain it.

cloud-fan · 2022-02-09T12:46:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala

+        rows += toCatalystRow(s"# ${output(0).name}", output(1).name, output(2).name)
+        rows ++= table.partitioning.sortBy(_.describe).map {
+          case t =>
+            val field = nameToField(t.describe)


we should cast v2 partitioning to IdentityTransform and get the column names

cloud-fan · 2022-02-09T12:50:17Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DescribeTableSuiteBase.scala

+            Row("# col_name", "data_type", "comment"),
+            Row("id", "bigint", null),
+            Row("", "", ""),
+            Row("# Metadata Columns", "", ""),


is it the only difference?

cloud-fan · 2022-02-09T12:51:20Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DescribeTableSuiteBase.scala

+            Row("Location", "file:/tmp/table_name", ""),
+            Row("Provider", "parquet", ""),
+            Row("Owner", "", ""),
+            Row("External", "true", ""),


Seems better to follow the v1 command and print Type: EXTERNAL/MANAGED

cloud-fan · 2022-02-09T12:53:09Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DescribeTableSuiteBase.scala

+          Row("Last Access", "UNKNOWN", ""),
+          Row("Created By", s"Spark ${org.apache.spark.SPARK_VERSION}", ""),
+          Row("Type", "EXTERNAL", ""),
+          Row("Provider", "parquet", ""),


Suggested change

Row("Provider", "parquet", ""),

Row("Provider", defaultUsing.stripLeft("USING").trim, ""),

cloud-fan · 2022-02-09T12:54:25Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+class DescribeTableSuite extends command.DescribeTableSuiteBase with CommandSuiteBase {
+  override def namespace: String = "ns1.ns2"
+
+  test("DESCRIBE TABLE with non-'partitioned-by' clause") {


Suggested change

test("DESCRIBE TABLE with non-'partitioned-by' clause") {

test("DESCRIBE TABLE with bucketed table") {

cloud-fan · 2022-02-09T12:57:05Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+  }
+
+
+  test("DESCRIBE TABLE with v2 catalog when table does not exist.") {


shall we move it to the base suite?

cloud-fan · 2022-02-09T12:58:01Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+    }
+  }
+
+  test("DESCRIBE TABLE EXTENDED using v2 catalog") {


seems duplicated with https://github.com/apache/spark/pull/35265/files#diff-bf1784ce13a5e3d67a12ed471efc57beacc90e5d3c2d37361e52ff84c28241fcR79 ?

cloud-fan · 2022-02-09T12:58:27Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+    }
+  }
+
+  test("SPARK-34561: drop/add columns to a dataset of `DESCRIBE TABLE`") {


can we move it to the base suite?

cloud-fan · 2022-02-09T13:15:02Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

+        Row("id", "bigint", null),
+        Row("", "", ""),
+        Row("# Partitioning", "", ""),
+        Row("Part 0", "bucket(3, id)", "")))


In fact, partition transforms are implicitly named. e.g. people can do TRUNCATE TABLE t PARTITION (a=1). We can get the partition names if the table implements SupportsPartitionManagement.

Now I have a new proposal to display partitionings, which is more consistent between v1 and v2

# Partition Information # col_name data_type comment c string d string # Bucketing Information # bucket_cols sort_cols num_buckets a, b c, d 4

If there exists partition transforms (which means we can't be v1 compatible anymore)

# Partition Information # col_name transform x year(ts) y bucket(3, id) z identity(d)

If the partition col names are unknown (the table doesn't implement SupportsPartitionManagement), we can use unknown_name_0, unknown_name_1, etc.

### What changes were proposed in this pull request? In the PR, I propose to change output of v2 `DESCRIBE TABLE`, and make it the same as v1. In particular: 1. Return NULL instead of empty strings when any comment doesn't exist in the schema info. 2. When a v2 table has identity transformations (partition by) only, output the partitioning info in v1 style. For instance: ```sql > CREATE TABLE tbl (id bigint, data string) USING _; > DESCRIBE TABLE tbl; +-----------------------+---------+-------+ |col_name |data_type|comment| +-----------------------+---------+-------+ |id |bigint |null | |data |string |null | |# Partition Information| | | |# col_name |data_type|comment| |id |bigint |null | +-----------------------+---------+-------+ ``` Also the PR moves/adds some tests to the base traits: - "DESCRIBE TABLE of a non-partitioned table" - "DESCRIBE TABLE of a partitioned table" and addresses review comments in #35265. ### Why are the changes needed? The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users. ### Does this PR introduce _any_ user-facing change? Yes, it changes outputs of v2 `DESCRIBE TABLE`. ### How was this patch tested? By running the `DESCRIBE TABLE` test suites: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DescribeTableSuite" ``` and related test suites: ``` $ build/sbt "test:testOnly *DataSourceV2SQLSuite" ``` Closes #36946 from MaxGekk/unify-v1-v2-describe-table. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? In the PR, I propose to change output of v2 `DESCRIBE TABLE`, and make it the same as v1. In particular: 1. Return NULL instead of empty strings when any comment doesn't exist in the schema info. 2. When a v2 table has identity transformations (partition by) only, output the partitioning info in v1 style. For instance: ```sql > CREATE TABLE tbl (id bigint, data string) USING _; > DESCRIBE TABLE tbl; +-----------------------+---------+-------+ |col_name |data_type|comment| +-----------------------+---------+-------+ |id |bigint |null | |data |string |null | |# Partition Information| | | |# col_name |data_type|comment| |id |bigint |null | +-----------------------+---------+-------+ ``` Also the PR moves/adds some tests to the base traits: - "DESCRIBE TABLE of a non-partitioned table" - "DESCRIBE TABLE of a partitioned table" and addresses review comments in apache/spark#35265. ### Why are the changes needed? The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users. ### Does this PR introduce _any_ user-facing change? Yes, it changes outputs of v2 `DESCRIBE TABLE`. ### How was this patch tested? By running the `DESCRIBE TABLE` test suites: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DescribeTableSuite" ``` and related test suites: ``` $ build/sbt "test:testOnly *DataSourceV2SQLSuite" ``` Closes #36946 from MaxGekk/unify-v1-v2-describe-table. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

github-actions · 2022-07-16T00:22:11Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

### What changes were proposed in this pull request? In the PR, I propose to change output of v2 `DESCRIBE TABLE`, and make it the same as v1. In particular: 1. Return NULL instead of empty strings when any comment doesn't exist in the schema info. 2. When a v2 table has identity transformations (partition by) only, output the partitioning info in v1 style. For instance: ```sql > CREATE TABLE tbl (id bigint, data string) USING _; > DESCRIBE TABLE tbl; +-----------------------+---------+-------+ |col_name |data_type|comment| +-----------------------+---------+-------+ |id |bigint |null | |data |string |null | |# Partition Information| | | |# col_name |data_type|comment| |id |bigint |null | +-----------------------+---------+-------+ ``` Also the PR moves/adds some tests to the base traits: - "DESCRIBE TABLE of a non-partitioned table" - "DESCRIBE TABLE of a partitioned table" and addresses review comments in apache/spark#35265. ### Why are the changes needed? The changes unify outputs of v1 and v2 implementations, and make the migration process from the version v1 to v2 easier for Spark SQL users. ### Does this PR introduce _any_ user-facing change? Yes, it changes outputs of v2 `DESCRIBE TABLE`. ### How was this patch tested? By running the `DESCRIBE TABLE` test suites: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DescribeTableSuite" ``` and related test suites: ``` $ build/sbt "test:testOnly *DataSourceV2SQLSuite" ``` Closes #36946 from MaxGekk/unify-v1-v2-describe-table. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

imback82 added 2 commits January 17, 2022 18:07

initial commit

8501b54

more changes

e37232f

github-actions bot added the SQL label Jan 21, 2022

imback82 changed the title ~~[SPARK-37888][SQL][TESTS] Unify v1 and v2 CREATE NAMESPACE tests~~ [SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests Jan 21, 2022

Add missing files

2c8dd54

imback82 changed the title ~~[SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests~~ [WIP][SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests Jan 21, 2022

imback82 commented Jan 21, 2022

View reviewed changes

imback82 marked this pull request as draft January 21, 2022 04:18

imback82 added 5 commits January 25, 2022 19:12

Merge branch 'master' into describe_table_test

f2244e1

Merge remote-tracking branch 'upstream/master' into describe_table_test

d7bd953

more changes

2a874b5

Merge remote-tracking branch 'upstream/master' into describe_table_test

18dffcd

WIP: evaluating the differences

632b763

cloud-fan reviewed Feb 9, 2022

View reviewed changes

gengliangwang mentioned this pull request Mar 7, 2022

[SPARK-38350][SQL] Make the table name output of V1/V2 "desc extended table" consistent #35681

Closed

Merge remote-tracking branch 'upstream/master' into describe_table_test

6241b03

MaxGekk mentioned this pull request May 25, 2022

[WIP][SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests #36671

Closed

cloud-fan mentioned this pull request Jun 21, 2022

[SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests #36912

Closed

MaxGekk mentioned this pull request Jun 22, 2022

[SPARK-39552][SQL] Unify v1 and v2 DESCRIBE TABLE #36946

Closed

github-actions bot added the Stale label Jul 16, 2022

github-actions bot closed this Jul 17, 2022

	Row("Provider", "parquet", ""),
	Row("Provider", defaultUsing.stripLeft("USING").trim, ""),

	test("DESCRIBE TABLE with non-'partitioned-by' clause") {
	test("DESCRIBE TABLE with bucketed table") {

		}


		test("DESCRIBE TABLE with v2 catalog when table does not exist.") {

[WIP][SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests #35265

[WIP][SPARK-37888][SQL][TESTS] Unify v1 and v2 DESCRIBE TABLE tests #35265

Uh oh!

Conversation

imback82 commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imback82 Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

imback82 commented Jan 21, 2022 •

edited

Loading

cloud-fan Jan 21, 2022 •

edited

Loading

imback82 Jan 25, 2022 •

edited

Loading

cloud-fan Feb 9, 2022 •

edited

Loading

cloud-fan Feb 9, 2022 •

edited

Loading