-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14127][SQL] Native "DESC [EXTENDED | FORMATTED] <table>" DDL command #12844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #57539 has finished for PR 12844 at commit
|
|
Test build #57597 has finished for PR 12844 at commit
|
803f28e to
0bc9f5a
Compare
|
Test build #57601 has finished for PR 12844 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo bug, not related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait... It's actually a typo from Hive... 😵 “Fixing" it fails existing test case.
- Shows partition columns for EXTENDED and FORMATTED - Shows "Compressed:" field - Shows data types in lower case
0bc9f5a to
9194fe1
Compare
|
Test build #57613 has finished for PR 12844 at commit
|
|
Test build #57621 has finished for PR 12844 at commit
|
|
Test build #57630 has finished for PR 12844 at commit
|
| inputFormat: Option[String], | ||
| outputFormat: Option[String], | ||
| serde: Option[String], | ||
| compressed: Boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever true? If it isn't we could leave it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm. Hive can pass compressed tables.
|
LGTM |
|
Test build #57729 has finished for PR 12844 at commit
|
|
Thanks for the review! I'm merging this to master and branch-2.0. |
…ommand
## What changes were proposed in this pull request?
This PR implements native `DESC [EXTENDED | FORMATTED] <table>` DDL command. Sample output:
```
scala> spark.sql("desc extended src").show(100, truncate = false)
+----------------------------+---------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+---------------------------------+-------+
|key |int | |
|value |string | |
| | | |
|# Detailed Table Information|CatalogTable(`default`.`src`, ...| |
+----------------------------+---------------------------------+-------+
scala> spark.sql("desc formatted src").show(100, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------+-------+
|key |int | |
|value |string | |
| | | |
|# Detailed Table Information| | |
|Database: |default | |
|Owner: |lian | |
|Create Time: |Mon Jan 04 17:06:00 CST 2016 | |
|Last Access Time: |Thu Jan 01 08:00:00 CST 1970 | |
|Location: |hdfs://localhost:9000/user/hive/warehouse_hive121/src | |
|Table Type: |MANAGED | |
|Table Parameters: | | |
| transient_lastDdlTime |1451898360 | |
| | | |
|# Storage Information | | |
|SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat: |org.apache.hadoop.mapred.TextInputFormat | |
|OutputFormat: |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
|Num Buckets: |-1 | |
|Bucket Columns: |[] | |
|Sort Columns: |[] | |
|Storage Desc Parameters: | | |
| serialization.format |1 | |
+----------------------------+----------------------------------------------------------+-------+
```
## How was this patch tested?
A test case is added to `HiveDDLSuite` to check command output.
Author: Cheng Lian <[email protected]>
Closes #12844 from liancheng/spark-14127-desc-table.
(cherry picked from commit f152fae)
Signed-off-by: Cheng Lian <[email protected]>
…data source tables ## What changes were proposed in this pull request? This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables. ## How was this patch tested? A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output. Author: Cheng Lian <[email protected]> Closes #12934 from liancheng/spark-14127-desc-table-follow-up. (cherry picked from commit 671b382) Signed-off-by: Yin Huai <[email protected]>
…data source tables ## What changes were proposed in this pull request? This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables. ## How was this patch tested? A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output. Author: Cheng Lian <[email protected]> Closes #12934 from liancheng/spark-14127-desc-table-follow-up.
…able properties for data source tables ## What changes were proposed in this pull request? This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read. Sample output: ``` +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |a |bigint | | |b |bigint | | |c |bigint | | |d |bigint | | |# Partition Information | | | |# col_name | | | |d | | | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |lian | | |Create Time: |Tue May 10 03:20:34 PDT 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/lian/local/src/spark/workspace-a/target/... | | |Table Type: |MANAGED | | |Table Parameters: | | | | rawDataSize |-1 | | | numFiles |1 | | | transient_lastDdlTime |1462875634 | | | totalSize |684 | | | spark.sql.sources.provider|parquet | | | EXTERNAL |FALSE | | | COLUMN_STATS_ACCURATE |false | | | numRows |-1 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Compressed: |No | | |Num Buckets: |2 | | |Bucket Columns: |[b] | | |Sort Columns: |[c] | | |Storage Desc Parameters: | | | | path |file:/Users/lian/local/src/spark/workspace-a/target/... | | | serialization.format |1 | | +----------------------------+---------------------------------------------------------+-------+ ``` ## How was this patch tested? Test cases are added in `HiveDDLSuite` to check command output. Author: Cheng Lian <[email protected]> Closes #13025 from liancheng/spark-14127-extract-schema-info. (cherry picked from commit 8a12580) Signed-off-by: Yin Huai <[email protected]>
…able properties for data source tables ## What changes were proposed in this pull request? This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read. Sample output: ``` +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |a |bigint | | |b |bigint | | |c |bigint | | |d |bigint | | |# Partition Information | | | |# col_name | | | |d | | | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |lian | | |Create Time: |Tue May 10 03:20:34 PDT 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/lian/local/src/spark/workspace-a/target/... | | |Table Type: |MANAGED | | |Table Parameters: | | | | rawDataSize |-1 | | | numFiles |1 | | | transient_lastDdlTime |1462875634 | | | totalSize |684 | | | spark.sql.sources.provider|parquet | | | EXTERNAL |FALSE | | | COLUMN_STATS_ACCURATE |false | | | numRows |-1 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Compressed: |No | | |Num Buckets: |2 | | |Bucket Columns: |[b] | | |Sort Columns: |[c] | | |Storage Desc Parameters: | | | | path |file:/Users/lian/local/src/spark/workspace-a/target/... | | | serialization.format |1 | | +----------------------------+---------------------------------------------------------+-------+ ``` ## How was this patch tested? Test cases are added in `HiveDDLSuite` to check command output. Author: Cheng Lian <[email protected]> Closes #13025 from liancheng/spark-14127-extract-schema-info.
| describe(relation, buffer) | ||
|
|
||
| append(buffer, "", "", "") | ||
| append(buffer, "# Detailed Table Information", relation.catalogTable.toString, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng To improve the output of Explain, I plan to change the default implementation of toString of case class CatalogTable. That will also affect the output of Describe Extended.
I checked what Hive did for the command Describe Extended, as follows.
Detailed Table Information Table(tableName:t1, dbName:default, owner:root, createTime:1462627092, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://6b68a24121f4:9000/user/hive/warehouse/t1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1462627092}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Basically, in the implementation of toString, I will try to follow what you did in describeFormatted but the contents will be in a single line. Feel free to let me know if you have any concern or suggestion. Thanks!
What changes were proposed in this pull request?
This PR implements native
DESC [EXTENDED | FORMATTED] <table>DDL command. Sample output:How was this patch tested?
A test case is added to
HiveDDLSuiteto check command output.