[SPARK-14346] SHOW CREATE TABLE for data source tables #12781

liancheng · 2016-04-29T17:51:59Z

What changes were proposed in this pull request?

This PR adds native SHOW CREATE TABLE DDL command for data source tables. Support for Hive tables will be added in follow-up PR(s).

To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal CREATE TABLE ... USING ... syntax.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

A new test suite ShowCreateTableSuite is added in sql/hive package to test the new feature.

liancheng · 2016-04-29T17:52:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

Moved to tables.scala, not related to this PR.

SparkQA · 2016-04-29T18:57:16Z

Test build #57348 has finished for PR 12781 at commit 092d942.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableCommand

SparkQA · 2016-04-29T19:01:05Z

Test build #57349 has finished for PR 12781 at commit 2f0c980.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableCommand

SparkQA · 2016-04-30T00:33:54Z

Test build #57378 has finished for PR 12781 at commit 7d21626.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-02T17:40:05Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@cloud-fan Do we allow users to specify bucketing without providing partitioning columns? Seems only DynamicPartitionWriterContainer support bucketSpec?

Yes, we do. If partition columns are empty but bucket columns are not, we will also use DynamicPartitionWriterContainer, see: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L124-L136

I just realized that OPTIONS goes first. I am wondering if it makes sense putting OPTIONS after PARTITIONED BY and bucketSpec.

(Update: let me ask around)

Let's add tests for this syntax (right now we only have tests for show create table).

SparkQA · 2016-05-03T09:03:05Z

Test build #57610 has finished for PR 12781 at commit d052d40.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-04T09:26:25Z

Test build #57744 has finished for PR 12781 at commit 3f23a9f.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-04T09:51:41Z

Test build #57737 has finished for PR 12781 at commit 25bc0a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-04T11:20:25Z

Test build #57748 has finished for PR 12781 at commit a437647.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-04T17:21:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

I think having a utility function in DDLUtils to construct the DataType from those strings stored in table properties will be very useful. This function can also help us to provide a nice output of describe table (when the table is a data source table). We can do it in a separate PR.

cloud-fan · 2016-05-05T08:49:17Z

LGTM except the weird test-should-fail issue.

liancheng · 2016-05-10T12:30:14Z

Updated test suite by normalizing CatalogTables before comparing them.

SparkQA · 2016-05-10T14:08:10Z

Test build #58233 has finished for PR 12781 at commit ea10cd4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-05-10T14:15:20Z

Part of the changes can be simplified using DDL utility methods introduced in #13025.

cloud-fan · 2016-05-11T03:12:05Z

#13025 is merged, are you going to update this PR?

liancheng · 2016-05-11T07:02:57Z

I think we can merge this one first. There are some other places where the new DDL utility methods can be used to simplify code. I can fix them altogether in follow-up PRs.

SparkQA · 2016-05-11T12:34:17Z

Test build #58364 has finished for PR 12781 at commit 87f4b18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-05-11T14:35:58Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala

+    sql(s"DROP TABLE ${table.quotedString}")
+
+    withTable(table.table) {
+      val newDDL = shownDDL.replaceFirst(table.table, table.table)


not needed?

cloud-fan · 2016-05-11T14:38:51Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala

+      )
+
+      table.copy(
+        identifier = expected.identifier,


looks like expected and actual will always have same identifier? We remove the table and create a new one with same name in checkCreateTable.

Yea, thanks, this is another place that I forgot to remove after simplifying the test infra.

cloud-fan · 2016-05-11T14:43:24Z

LGTM

SparkQA · 2016-05-11T15:12:53Z

Test build #58372 has finished for PR 12781 at commit 890bcc0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-11T15:17:14Z

Test build #58374 has finished for PR 12781 at commit baa177b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-05-11T16:24:57Z

retest this please

liancheng · 2016-05-11T16:25:50Z

The last build failure was a compilation failure in a file not even touched by this PR. Trying again.

SparkQA · 2016-05-11T16:56:53Z

Test build #58381 has finished for PR 12781 at commit baa177b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-11T22:39:28Z

yea, build was broken

yhuai · 2016-05-11T22:39:33Z

test this please

SparkQA · 2016-05-12T00:13:23Z

Test build #58420 has finished for PR 12781 at commit baa177b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-12T03:41:57Z

Thanks. I am going to merge this to master and branch 2.0.

yhuai · 2016-05-12T03:43:25Z

Let's have a follow-up prs to make the following changes:

Make the command use those new helper functions added in DDLUtils.
Add tests for new keywords added in CREATE TABLE USING syntax
Support tables stored with Hive formats.

## What changes were proposed in this pull request? This PR adds native `SHOW CREATE TABLE` DDL command for data source tables. Support for Hive tables will be added in follow-up PR(s). To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal `CREATE TABLE ... USING ...` syntax. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) A new test suite `ShowCreateTableSuite` is added in sql/hive package to test the new feature. Author: Cheng Lian <[email protected]> Closes #12781 from liancheng/spark-14346-show-create-table. (cherry picked from commit f036dd7) Signed-off-by: Yin Huai <[email protected]>

…rtition and bucket ## What changes were proposed in this pull request? #12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly. This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #13144 from cloud-fan/add-test.

…rtition and bucket ## What changes were proposed in this pull request? #12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly. This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #13144 from cloud-fan/add-test. (cherry picked from commit 20a8947) Signed-off-by: Yin Huai <[email protected]>

## What changes were proposed in this pull request? This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc.. ## How was this patch tested? New test cases are added in `ShowCreateTableSuite` to do round-trip tests. Author: Cheng Lian <[email protected]> Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables. (cherry picked from commit b674e67) Signed-off-by: Yin Huai <[email protected]>

## What changes were proposed in this pull request? This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc.. ## How was this patch tested? New test cases are added in `ShowCreateTableSuite` to do round-trip tests. Author: Cheng Lian <[email protected]> Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.

liancheng reviewed Apr 29, 2016
View reviewed changes

liancheng force-pushed the spark-14346-show-create-table branch from 092d942 to 2f0c980 Compare April 29, 2016 17:53

yhuai reviewed May 2, 2016
View reviewed changes

liancheng force-pushed the spark-14346-show-create-table branch 2 times, most recently from 25bc0a4 to 3f23a9f Compare May 4, 2016 09:19

yhuai reviewed May 4, 2016
View reviewed changes

liancheng force-pushed the spark-14346-show-create-table branch from a437647 to ea10cd4 Compare May 10, 2016 12:30

srowen mentioned this pull request May 10, 2016

[SPARK-14346][SQL] Show Create Table (Native) #12579

Closed

liancheng added 4 commits May 11, 2016 18:10

Moves table commands to tables.scala

638be69

SHOW CREATE TABLE for data source tables

01af690

Fixes test failure

de8f532

Addresses PR comments

2f67939

cloud-fan reviewed May 11, 2016
View reviewed changes

Removes trivial unnecessary code path

890bcc0

cloud-fan reviewed May 11, 2016
View reviewed changes

Addresses PR comment

baa177b

asfgit closed this in f036dd7 May 12, 2016

liancheng deleted the spark-14346-show-create-table branch May 12, 2016 05:27

liancheng mentioned this pull request May 12, 2016

[SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views #13079

Closed

cloud-fan mentioned this pull request May 17, 2016

[SPARK-14346][SQL][follow-up] add tests for CREAT TABLE USING with partition and bucket #13144

Closed

[SPARK-14346] SHOW CREATE TABLE for data source tables #12781

[SPARK-14346] SHOW CREATE TABLE for data source tables #12781

Uh oh!

Conversation

liancheng commented Apr 29, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 29, 2016

Uh oh!

SparkQA commented Apr 29, 2016

Uh oh!

SparkQA commented Apr 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhuai May 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 3, 2016

Uh oh!

SparkQA commented May 4, 2016

Uh oh!

SparkQA commented May 4, 2016

Uh oh!

SparkQA commented May 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 5, 2016

Uh oh!

liancheng commented May 10, 2016

Uh oh!

SparkQA commented May 10, 2016

Uh oh!

liancheng commented May 10, 2016

Uh oh!

cloud-fan commented May 11, 2016

Uh oh!

liancheng commented May 11, 2016

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 11, 2016

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

liancheng commented May 11, 2016

Uh oh!

liancheng commented May 11, 2016

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

yhuai commented May 11, 2016

Uh oh!

yhuai commented May 11, 2016

Uh oh!

SparkQA commented May 12, 2016

Uh oh!

yhuai commented May 12, 2016

Uh oh!

yhuai commented May 12, 2016

Uh oh!

yhuai May 11, 2016 •

edited

Loading