-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14346] SHOW CREATE TABLE for data source tables #12781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to tables.scala, not related to this PR.
092d942 to
2f0c980
Compare
|
Test build #57348 has finished for PR 12781 at commit
|
|
Test build #57349 has finished for PR 12781 at commit
|
|
Test build #57378 has finished for PR 12781 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Do we allow users to specify bucketing without providing partitioning columns? Seems only DynamicPartitionWriterContainer support bucketSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. If partition columns are empty but bucket columns are not, we will also use DynamicPartitionWriterContainer, see: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L124-L136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that OPTIONS goes first. I am wondering if it makes sense putting OPTIONS after PARTITIONED BY and bucketSpec.
(Update: let me ask around)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add tests for this syntax (right now we only have tests for show create table).
|
Test build #57610 has finished for PR 12781 at commit
|
25bc0a4 to
3f23a9f
Compare
|
Test build #57744 has finished for PR 12781 at commit
|
|
Test build #57737 has finished for PR 12781 at commit
|
|
Test build #57748 has finished for PR 12781 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a utility function in DDLUtils to construct the DataType from those strings stored in table properties will be very useful. This function can also help us to provide a nice output of describe table (when the table is a data source table). We can do it in a separate PR.
|
LGTM except the weird test-should-fail issue. |
a437647 to
ea10cd4
Compare
|
Updated test suite by normalizing |
|
Test build #58233 has finished for PR 12781 at commit
|
|
Part of the changes can be simplified using DDL utility methods introduced in #13025. |
|
#13025 is merged, are you going to update this PR? |
|
I think we can merge this one first. There are some other places where the new DDL utility methods can be used to simplify code. I can fix them altogether in follow-up PRs. |
|
Test build #58364 has finished for PR 12781 at commit
|
| sql(s"DROP TABLE ${table.quotedString}") | ||
|
|
||
| withTable(table.table) { | ||
| val newDDL = shownDDL.replaceFirst(table.table, table.table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed?
| ) | ||
|
|
||
| table.copy( | ||
| identifier = expected.identifier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like expected and actual will always have same identifier? We remove the table and create a new one with same name in checkCreateTable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, thanks, this is another place that I forgot to remove after simplifying the test infra.
|
LGTM |
|
Test build #58372 has finished for PR 12781 at commit
|
|
Test build #58374 has finished for PR 12781 at commit
|
|
retest this please |
|
The last build failure was a compilation failure in a file not even touched by this PR. Trying again. |
|
Test build #58381 has finished for PR 12781 at commit
|
|
yea, build was broken |
|
test this please |
|
Test build #58420 has finished for PR 12781 at commit
|
|
Thanks. I am going to merge this to master and branch 2.0. |
|
Let's have a follow-up prs to make the following changes:
|
## What changes were proposed in this pull request? This PR adds native `SHOW CREATE TABLE` DDL command for data source tables. Support for Hive tables will be added in follow-up PR(s). To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal `CREATE TABLE ... USING ...` syntax. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) A new test suite `ShowCreateTableSuite` is added in sql/hive package to test the new feature. Author: Cheng Lian <[email protected]> Closes #12781 from liancheng/spark-14346-show-create-table. (cherry picked from commit f036dd7) Signed-off-by: Yin Huai <[email protected]>
…rtition and bucket ## What changes were proposed in this pull request? #12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly. This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #13144 from cloud-fan/add-test.
…rtition and bucket ## What changes were proposed in this pull request? #12781 introduced PARTITIONED BY, CLUSTERED BY, and SORTED BY keywords to CREATE TABLE USING. This PR adds tests to make sure those keywords are handled correctly. This PR also fixes a mistake that we should create non-hive-compatible table if partition or bucket info exists. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #13144 from cloud-fan/add-test. (cherry picked from commit 20a8947) Signed-off-by: Yin Huai <[email protected]>
## What changes were proposed in this pull request? This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc.. ## How was this patch tested? New test cases are added in `ShowCreateTableSuite` to do round-trip tests. Author: Cheng Lian <[email protected]> Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables. (cherry picked from commit b674e67) Signed-off-by: Yin Huai <[email protected]>
## What changes were proposed in this pull request? This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for Hive tables and views. A new field `hasUnsupportedFeatures` is added to `CatalogTable` to indicate whether all table metadata retrieved from the concrete underlying external catalog (i.e. Hive metastore in this case) can be mapped to fields in `CatalogTable`. This flag is useful when the target Hive table contains structures that can't be handled by Spark SQL, e.g., skewed columns and storage handler, etc.. ## How was this patch tested? New test cases are added in `ShowCreateTableSuite` to do round-trip tests. Author: Cheng Lian <[email protected]> Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.
What changes were proposed in this pull request?
This PR adds native
SHOW CREATE TABLEDDL command for data source tables. Support for Hive tables will be added in follow-up PR(s).To show table creation DDL for data source tables created by CTAS statements, this PR also added partitioning and bucketing support for normal
CREATE TABLE ... USING ...syntax.How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
A new test suite
ShowCreateTableSuiteis added in sql/hive package to test the new feature.