Skip to content

Conversation

@liancheng
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds native SHOW CREATE TABLE DDL command for data source tables. Support for Hive tables will be added in follow-up PR(s).

To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal CREATE TABLE ... USING ... syntax.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

A new test suite ShowCreateTableSuite is added in sql/hive package to test the new feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tables.scala, not related to this PR.

@liancheng liancheng force-pushed the spark-14346-show-create-table branch from 092d942 to 2f0c980 Compare April 29, 2016 17:53
@SparkQA
Copy link

SparkQA commented Apr 29, 2016

Test build #57348 has finished for PR 12781 at commit 092d942.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableCommand

@SparkQA
Copy link

SparkQA commented Apr 29, 2016

Test build #57349 has finished for PR 12781 at commit 2f0c980.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableCommand

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57378 has finished for PR 12781 at commit 7d21626.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Do we allow users to specify bucketing without providing partitioning columns? Seems only DynamicPartitionWriterContainer support bucketSpec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. If partition columns are empty but bucket columns are not, we will also use DynamicPartitionWriterContainer, see: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L124-L136

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@yhuai yhuai May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that OPTIONS goes first. I am wondering if it makes sense putting OPTIONS after PARTITIONED BY and bucketSpec.

(Update: let me ask around)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add tests for this syntax (right now we only have tests for show create table).

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57610 has finished for PR 12781 at commit d052d40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the spark-14346-show-create-table branch 2 times, most recently from 25bc0a4 to 3f23a9f Compare May 4, 2016 09:19
@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57744 has finished for PR 12781 at commit 3f23a9f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57737 has finished for PR 12781 at commit 25bc0a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57748 has finished for PR 12781 at commit a437647.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a utility function in DDLUtils to construct the DataType from those strings stored in table properties will be very useful. This function can also help us to provide a nice output of describe table (when the table is a data source table). We can do it in a separate PR.

@cloud-fan
Copy link
Contributor

LGTM except the weird test-should-fail issue.

@liancheng liancheng force-pushed the spark-14346-show-create-table branch from a437647 to ea10cd4 Compare May 10, 2016 12:30
@liancheng
Copy link
Contributor Author

Updated test suite by normalizing CatalogTables before comparing them.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58233 has finished for PR 12781 at commit ea10cd4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Part of the changes can be simplified using DDL utility methods introduced in #13025.

@cloud-fan
Copy link
Contributor

#13025 is merged, are you going to update this PR?

@liancheng
Copy link
Contributor Author

I think we can merge this one first. There are some other places where the new DDL utility methods can be used to simplify code. I can fix them altogether in follow-up PRs.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58364 has finished for PR 12781 at commit 87f4b18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sql(s"DROP TABLE ${table.quotedString}")

withTable(table.table) {
val newDDL = shownDDL.replaceFirst(table.table, table.table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?

)

table.copy(
identifier = expected.identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like expected and actual will always have same identifier? We remove the table and create a new one with same name in checkCreateTable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, thanks, this is another place that I forgot to remove after simplifying the test infra.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58372 has finished for PR 12781 at commit 890bcc0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58374 has finished for PR 12781 at commit baa177b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

retest this please

@liancheng
Copy link
Contributor Author

The last build failure was a compilation failure in a file not even touched by this PR. Trying again.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58381 has finished for PR 12781 at commit baa177b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented May 11, 2016

yea, build was broken

@yhuai
Copy link
Contributor

yhuai commented May 11, 2016

test this please

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58420 has finished for PR 12781 at commit baa177b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented May 12, 2016

Thanks. I am going to merge this to master and branch 2.0.

@yhuai
Copy link
Contributor

yhuai commented May 12, 2016

Let's have a follow-up prs to make the following changes:

  1. Make the command use those new helper functions added in DDLUtils.
  2. Add tests for new keywords added in CREATE TABLE USING syntax
  3. Support tables stored with Hive formats.

asfgit pushed a commit that referenced this pull request May 12, 2016
## What changes were proposed in this pull request?

This PR adds native `SHOW CREATE TABLE` DDL command for data source tables. Support for Hive tables will be added in follow-up PR(s).

To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal `CREATE TABLE ... USING ...` syntax.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

A new test suite `ShowCreateTableSuite` is added in sql/hive package to test the new feature.

Author: Cheng Lian <[email protected]>

Closes #12781 from liancheng/spark-14346-show-create-table.

(cherry picked from commit f036dd7)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in f036dd7 May 12, 2016
@liancheng liancheng deleted the spark-14346-show-create-table branch May 12, 2016 05:27
asfgit pushed a commit that referenced this pull request May 17, 2016
…rtition and bucket

## What changes were proposed in this pull request?

#12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly.

This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #13144 from cloud-fan/add-test.
asfgit pushed a commit that referenced this pull request May 17, 2016
…rtition and bucket

## What changes were proposed in this pull request?

#12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly.

This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #13144 from cloud-fan/add-test.

(cherry picked from commit 20a8947)
Signed-off-by: Yin Huai <[email protected]>
asfgit pushed a commit that referenced this pull request May 17, 2016
## What changes were proposed in this pull request?

This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc..

## How was this patch tested?

New test cases are added in `ShowCreateTableSuite` to do round-trip tests.

Author: Cheng Lian <[email protected]>

Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.

(cherry picked from commit b674e67)
Signed-off-by: Yin Huai <[email protected]>
asfgit pushed a commit that referenced this pull request May 17, 2016
## What changes were proposed in this pull request?

This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc..

## How was this patch tested?

New test cases are added in `ShowCreateTableSuite` to do round-trip tests.

Author: Cheng Lian <[email protected]>

Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants