-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14346][SQL] Show Create Table (Native) #12579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ca44d67 to
bd0d8f5
Compare
|
@yhuai @andrewor14 Thanks! |
|
test this please |
|
Test build #56899 has finished for PR 12579 at commit
|
|
@liancheng Thanks for triggering the test! I am looking into the test failure. |
13e9775 to
1b08feb
Compare
|
retest this please |
1b08feb to
9e39b5c
Compare
|
@yhuai @liancheng , I see PR #12734 takes care of the PARTITIONED BY and CLUSTERED BY (with SORTED BY) clause for CTAS syntax, but not for non-CTAS syntax. Now I need to change my PR to adapt to this change, which means that the generated DDL will be something like Also DataFrameWriter.saveAsTable case is like CTAS. Can we then generate the DDL as a regular CTAS syntax? This will change my current implementation in this PR. |
|
Can one of the admins verify this patch? |
|
Hey @xwu0226, sorry that I didn't explain why I opened another PR for the same issue, was in code rush for 2.0... So one of the considerations for all the native DDL commands is that we don't want these DDL commands to rely on Hive anymore. This is because we'd like to remove Hive dependency from Spark SQL core and gradually make Hive a separate data source in the future. This means, we shouldn't add new code in places like One apparent problem of this approach is that, current Spark SQL interfaces don't capture all semantics of Hive. For example, some table metadata like skew spec is not covered in
There will be a follow-up of #12781 to add support for Hive tables. After offline discussion with @yhuai, we decided to add a flag in |
|
@liancheng Thank you for the detail explanation!! Yeah. if the goal is to make sure Spark SQL can handle the generated DDL, then, we need to miss some hive features for now. I will close this PR. |
This is a rebased version of #12132 and #12406
What changes were proposed in this pull request?
Allow users to issue "
SHOW CREATE TABLE" command natively in SparkSQL.-- For tables that are created by Hive, this command will display the DDL in hive syntax. If the syntax includes
CLUSTERED BY, SKEWED BY or STORED BYclause, there will be a warning message saying that this DDL is not supported in SparkSQL native DDL yet.-- For tables that are created by datasource DDL, such as "
CREATE TABLE... USING ... OPTIONS (...)", it will show the DDL in this syntax.-- For tables that are created by dataframe API, such as "
df.write.partitionBy(...).saveAsTable(...)", currently the command will display DDL with the syntax "CREATE TABLE.. USING...OPTIONS(...)". However, this syntax lose the partitioning information. It is proposed to display create table in the dataframe API format, such as<DataFarme>.write.partitionBy("a").bucketBy("c").format("parquet").saveAsTable("T1")How was this patch tested?
Unit tests are created.