[SPARK-14346][SQL] Show Create Table (Native) #12579

xwu0226 · 2016-04-21T17:41:01Z

This is a rebased version of #12132 and #12406

What changes were proposed in this pull request?

Allow users to issue "SHOW CREATE TABLE" command natively in SparkSQL.
-- For tables that are created by Hive, this command will display the DDL in hive syntax. If the syntax includes CLUSTERED BY, SKEWED BY or STORED BY clause, there will be a warning message saying that this DDL is not supported in SparkSQL native DDL yet.

-- For tables that are created by datasource DDL, such as "CREATE TABLE... USING ... OPTIONS (...)", it will show the DDL in this syntax.

-- For tables that are created by dataframe API, such as "df.write.partitionBy(...).saveAsTable(...)", currently the command will display DDL with the syntax "CREATE TABLE.. USING...OPTIONS(...)". However, this syntax lose the partitioning information. It is proposed to display create table in the dataframe API format, such as <DataFarme>.write.partitionBy("a").bucketBy("c").format("parquet").saveAsTable("T1")

How was this patch tested?

Unit tests are created.

xwu0226 · 2016-04-21T21:47:53Z

@yhuai @andrewor14 Thanks!

liancheng · 2016-04-25T16:04:02Z

test this please

SparkQA · 2016-04-25T18:09:27Z

Test build #56899 has finished for PR 12579 at commit 13e9775.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

xwu0226 · 2016-04-25T18:31:28Z

@liancheng Thanks for triggering the test! I am looking into the test failure.

gatorsmile · 2016-04-25T22:44:13Z

retest this please

… work

xwu0226 · 2016-04-28T03:59:22Z

@yhuai @liancheng , I see PR #12734 takes care of the PARTITIONED BY and CLUSTERED BY (with SORTED BY) clause for CTAS syntax, but not for non-CTAS syntax. Now I need to change my PR to adapt to this change, which means that the generated DDL will be something like create table t1 (c1 int, ...) using .. options (..) partitioned by (..) clustered by (...) sorted by (...) in ... buckets. There may not be a "select clause" following it since we do not have the original query. But such generated query will not run because #12734 does not support it. Can we add a fake select clause with a warning message?

Also DataFrameWriter.saveAsTable case is like CTAS. Can we then generate the DDL as a regular CTAS syntax? This will change my current implementation in this PR.
Please advice, thanks a lot!

AmplabJenkins · 2016-05-04T07:17:14Z

Can one of the admins verify this patch?

srowen · 2016-05-10T15:22:45Z

@xwu0226 I think this is superseded by #12781 ?

xwu0226 · 2016-05-10T17:57:16Z

@srowen Yes, for datasource table. This PR also includes the work for hive syntax DDL too. I see #12781 mentions that there will be followup PR taking care of the hive syntax DDL. So I wondering whether I should continue on this PR. I can close this one if there is no need. Thanks!

liancheng · 2016-05-11T07:09:26Z

Hey @xwu0226, sorry that I didn't explain why I opened another PR for the same issue, was in code rush for 2.0...

So one of the considerations for all the native DDL commands is that we don't want these DDL commands to rely on Hive anymore. This is because we'd like to remove Hive dependency from Spark SQL core and gradually make Hive a separate data source in the future. This means, we shouldn't add new code in places like HiveClientImpl. These new DDL command should be implemented upon interfaces like CatalogTable.

One apparent problem of this approach is that, current Spark SQL interfaces don't capture all semantics of Hive. For example, some table metadata like skew spec is not covered in CatalogTable yet. Our general strategies are:

For easy ones, like "owner" and "compressed" in [SPARK-14127][SQL] Native "DESC [EXTENDED | FORMATTED] <table>" DDL command #12844, we may just add them to the interface and leverage them.
For features that are not supported in Spark SQL, for example, skew spec, we can simply ignore them for now, since Spark can't handle them anyway.

There will be a follow-up of #12781 to add support for Hive tables. After offline discussion with @yhuai, we decided to add a flag in CatalogTable to indicate whether there are unrecognized metadata provided by the underlying external catalog, but not translated and included in CatalogTable. In this way, when applying SHOW CREATE TABLE to tables containing such metadata, this flag can be set to true, and we can simply refuse to output anything by checking this flag. This makes sense because even if you add things like skew spec in the result of SHOW CREATE TABLE, Spark can't handle the generated DDL statement

xwu0226 · 2016-05-11T18:35:11Z

@liancheng Thank you for the detail explanation!! Yeah. if the goal is to make sure Spark SQL can handle the generated DDL, then, we need to miss some hive features for now. I will close this PR.

xwu0226 force-pushed the show_create_table_3 branch from ca44d67 to bd0d8f5 Compare April 21, 2016 21:34

xwu0226 force-pushed the show_create_table_3 branch from 13e9775 to 1b08feb Compare April 25, 2016 22:25

xwu0226 added 18 commits April 26, 2016 18:45

show create table DDL -- hive metastore table

bd6f508

update upon review

302a023

ignoring sqlContext temp table and considering datasource table ddl

8aae3bd

fix scala style issue

ce2988f

fix scala style issue in testcase

6a9b8d6

fix testcase for test failure

077e06c

continue the database ddl generation

6b6476d

support datasource ddl

b2be332

scala style fix

790fca0

rework show create ddl based on new native supported create table DDL…

b4aa2da

… work

remove spaces

b9a29e1

update upon review - use visitTableIdentifier

21fec43

generate dataframe API create table for some datasource tables

6e805da

update upon review

f003b9f

synch up again

d18a373

remove unnecessary imports in HiveSessionCatalog

a09b013

fix testcase and moved showcreatetable command class to tables.scala

5acc5f3

rebase and use sparkSession

9e39b5c

xwu0226 force-pushed the show_create_table_3 branch from 1b08feb to 9e39b5c Compare April 27, 2016 06:43

remove extra imports

ec8182a

xwu0226 closed this May 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14346][SQL] Show Create Table (Native) #12579

[SPARK-14346][SQL] Show Create Table (Native) #12579

Uh oh!

xwu0226 commented Apr 21, 2016 •

edited

Loading

Uh oh!

xwu0226 commented Apr 21, 2016

Uh oh!

liancheng commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

xwu0226 commented Apr 25, 2016

Uh oh!

gatorsmile commented Apr 25, 2016

Uh oh!

xwu0226 commented Apr 28, 2016 •

edited

Loading

Uh oh!

AmplabJenkins commented May 4, 2016

Uh oh!

srowen commented May 10, 2016

Uh oh!

xwu0226 commented May 10, 2016

Uh oh!

liancheng commented May 11, 2016 •

edited

Loading

Uh oh!

xwu0226 commented May 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-14346][SQL] Show Create Table (Native) #12579

[SPARK-14346][SQL] Show Create Table (Native) #12579

Uh oh!

Conversation

xwu0226 commented Apr 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

xwu0226 commented Apr 21, 2016

Uh oh!

liancheng commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

xwu0226 commented Apr 25, 2016

Uh oh!

gatorsmile commented Apr 25, 2016

Uh oh!

xwu0226 commented Apr 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented May 4, 2016

Uh oh!

srowen commented May 10, 2016

Uh oh!

xwu0226 commented May 10, 2016

Uh oh!

liancheng commented May 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xwu0226 commented May 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xwu0226 commented Apr 21, 2016 •

edited

Loading

xwu0226 commented Apr 28, 2016 •

edited

Loading

liancheng commented May 11, 2016 •

edited

Loading