Skip to content

Conversation

@liancheng
Copy link
Contributor

What changes were proposed in this pull request?

This PR implements native DESC [EXTENDED | FORMATTED] <table> DDL command. Sample output:

scala> spark.sql("desc extended src").show(100, truncate = false)
+----------------------------+---------------------------------+-------+
|col_name                    |data_type                        |comment|
+----------------------------+---------------------------------+-------+
|key                         |int                              |       |
|value                       |string                           |       |
|                            |                                 |       |
|# Detailed Table Information|CatalogTable(`default`.`src`, ...|       |
+----------------------------+---------------------------------+-------+


scala> spark.sql("desc formatted src").show(100, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name                    |data_type                                                 |comment|
+----------------------------+----------------------------------------------------------+-------+
|key                         |int                                                       |       |
|value                       |string                                                    |       |
|                            |                                                          |       |
|# Detailed Table Information|                                                          |       |
|Database:                   |default                                                   |       |
|Owner:                      |lian                                                      |       |
|Create Time:                |Mon Jan 04 17:06:00 CST 2016                              |       |
|Last Access Time:           |Thu Jan 01 08:00:00 CST 1970                              |       |
|Location:                   |hdfs://localhost:9000/user/hive/warehouse_hive121/src     |       |
|Table Type:                 |MANAGED                                                   |       |
|Table Parameters:           |                                                          |       |
|  transient_lastDdlTime     |1451898360                                                |       |
|                            |                                                          |       |
|# Storage Information       |                                                          |       |
|SerDe Library:              |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe        |       |
|InputFormat:                |org.apache.hadoop.mapred.TextInputFormat                  |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|       |
|Num Buckets:                |-1                                                        |       |
|Bucket Columns:             |[]                                                        |       |
|Sort Columns:               |[]                                                        |       |
|Storage Desc Parameters:    |                                                          |       |
|  serialization.format      |1                                                         |       |
+----------------------------+----------------------------------------------------------+-------+

How was this patch tested?

A test case is added to HiveDDLSuite to check command output.

@SparkQA
Copy link

SparkQA commented May 2, 2016

Test build #57539 has finished for PR 12844 at commit 718da25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57597 has finished for PR 12844 at commit 803f28e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the spark-14127-desc-table branch from 803f28e to 0bc9f5a Compare May 3, 2016 05:42
@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57601 has finished for PR 12844 at commit 0bc9f5a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo bug, not related to this PR.

Copy link
Contributor Author

@liancheng liancheng May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait... It's actually a typo from Hive... 😵 “Fixing" it fails existing test case.

- Shows partition columns for EXTENDED and FORMATTED
- Shows "Compressed:" field
- Shows data types in lower case
@liancheng liancheng force-pushed the spark-14127-desc-table branch from 0bc9f5a to 9194fe1 Compare May 3, 2016 08:08
@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57613 has finished for PR 12844 at commit 9194fe1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57621 has finished for PR 12844 at commit a66885a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57630 has finished for PR 12844 at commit 18b9bb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

inputFormat: Option[String],
outputFormat: Option[String],
serde: Option[String],
compressed: Boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever true? If it isn't we could leave it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm. Hive can pass compressed tables.

@hvanhovell
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57729 has finished for PR 12844 at commit b5dbc15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Thanks for the review! I'm merging this to master and branch-2.0.

asfgit pushed a commit that referenced this pull request May 4, 2016
…ommand

## What changes were proposed in this pull request?

This PR implements native `DESC [EXTENDED | FORMATTED] <table>` DDL command. Sample output:

```
scala> spark.sql("desc extended src").show(100, truncate = false)
+----------------------------+---------------------------------+-------+
|col_name                    |data_type                        |comment|
+----------------------------+---------------------------------+-------+
|key                         |int                              |       |
|value                       |string                           |       |
|                            |                                 |       |
|# Detailed Table Information|CatalogTable(`default`.`src`, ...|       |
+----------------------------+---------------------------------+-------+

scala> spark.sql("desc formatted src").show(100, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name                    |data_type                                                 |comment|
+----------------------------+----------------------------------------------------------+-------+
|key                         |int                                                       |       |
|value                       |string                                                    |       |
|                            |                                                          |       |
|# Detailed Table Information|                                                          |       |
|Database:                   |default                                                   |       |
|Owner:                      |lian                                                      |       |
|Create Time:                |Mon Jan 04 17:06:00 CST 2016                              |       |
|Last Access Time:           |Thu Jan 01 08:00:00 CST 1970                              |       |
|Location:                   |hdfs://localhost:9000/user/hive/warehouse_hive121/src     |       |
|Table Type:                 |MANAGED                                                   |       |
|Table Parameters:           |                                                          |       |
|  transient_lastDdlTime     |1451898360                                                |       |
|                            |                                                          |       |
|# Storage Information       |                                                          |       |
|SerDe Library:              |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe        |       |
|InputFormat:                |org.apache.hadoop.mapred.TextInputFormat                  |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|       |
|Num Buckets:                |-1                                                        |       |
|Bucket Columns:             |[]                                                        |       |
|Sort Columns:               |[]                                                        |       |
|Storage Desc Parameters:    |                                                          |       |
|  serialization.format      |1                                                         |       |
+----------------------------+----------------------------------------------------------+-------+
```

## How was this patch tested?

A test case is added to `HiveDDLSuite` to check command output.

Author: Cheng Lian <[email protected]>

Closes #12844 from liancheng/spark-14127-desc-table.

(cherry picked from commit f152fae)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in f152fae May 4, 2016
@liancheng liancheng deleted the spark-14127-desc-table branch May 4, 2016 15:19
asfgit pushed a commit that referenced this pull request May 9, 2016
…data source tables

## What changes were proposed in this pull request?

This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables.

## How was this patch tested?

A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output.

Author: Cheng Lian <[email protected]>

Closes #12934 from liancheng/spark-14127-desc-table-follow-up.

(cherry picked from commit 671b382)
Signed-off-by: Yin Huai <[email protected]>
asfgit pushed a commit that referenced this pull request May 9, 2016
…data source tables

## What changes were proposed in this pull request?

This is a follow-up of PR #12844. It makes the newly updated `DescribeTableCommand` to support data sources tables.

## How was this patch tested?

A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output.

Author: Cheng Lian <[email protected]>

Closes #12934 from liancheng/spark-14127-desc-table-follow-up.
asfgit pushed a commit that referenced this pull request May 10, 2016
…able properties for data source tables

## What changes were proposed in this pull request?

This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read.

Sample output:

```
+----------------------------+---------------------------------------------------------+-------+
|col_name                    |data_type                                                |comment|
+----------------------------+---------------------------------------------------------+-------+
|a                           |bigint                                                   |       |
|b                           |bigint                                                   |       |
|c                           |bigint                                                   |       |
|d                           |bigint                                                   |       |
|# Partition Information     |                                                         |       |
|# col_name                  |                                                         |       |
|d                           |                                                         |       |
|                            |                                                         |       |
|# Detailed Table Information|                                                         |       |
|Database:                   |default                                                  |       |
|Owner:                      |lian                                                     |       |
|Create Time:                |Tue May 10 03:20:34 PDT 2016                             |       |
|Last Access Time:           |Wed Dec 31 16:00:00 PST 1969                             |       |
|Location:                   |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|Table Type:                 |MANAGED                                                  |       |
|Table Parameters:           |                                                         |       |
|  rawDataSize               |-1                                                       |       |
|  numFiles                  |1                                                        |       |
|  transient_lastDdlTime     |1462875634                                               |       |
|  totalSize                 |684                                                      |       |
|  spark.sql.sources.provider|parquet                                                  |       |
|  EXTERNAL                  |FALSE                                                    |       |
|  COLUMN_STATS_ACCURATE     |false                                                    |       |
|  numRows                   |-1                                                       |       |
|                            |                                                         |       |
|# Storage Information       |                                                         |       |
|SerDe Library:              |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       |       |
|InputFormat:                |org.apache.hadoop.mapred.SequenceFileInputFormat         |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat|       |
|Compressed:                 |No                                                       |       |
|Num Buckets:                |2                                                        |       |
|Bucket Columns:             |[b]                                                      |       |
|Sort Columns:               |[c]                                                      |       |
|Storage Desc Parameters:    |                                                         |       |
|  path                      |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|  serialization.format      |1                                                        |       |
+----------------------------+---------------------------------------------------------+-------+
```

## How was this patch tested?

Test cases are added in `HiveDDLSuite` to check command output.

Author: Cheng Lian <[email protected]>

Closes #13025 from liancheng/spark-14127-extract-schema-info.

(cherry picked from commit 8a12580)
Signed-off-by: Yin Huai <[email protected]>
asfgit pushed a commit that referenced this pull request May 10, 2016
…able properties for data source tables

## What changes were proposed in this pull request?

This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read.

Sample output:

```
+----------------------------+---------------------------------------------------------+-------+
|col_name                    |data_type                                                |comment|
+----------------------------+---------------------------------------------------------+-------+
|a                           |bigint                                                   |       |
|b                           |bigint                                                   |       |
|c                           |bigint                                                   |       |
|d                           |bigint                                                   |       |
|# Partition Information     |                                                         |       |
|# col_name                  |                                                         |       |
|d                           |                                                         |       |
|                            |                                                         |       |
|# Detailed Table Information|                                                         |       |
|Database:                   |default                                                  |       |
|Owner:                      |lian                                                     |       |
|Create Time:                |Tue May 10 03:20:34 PDT 2016                             |       |
|Last Access Time:           |Wed Dec 31 16:00:00 PST 1969                             |       |
|Location:                   |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|Table Type:                 |MANAGED                                                  |       |
|Table Parameters:           |                                                         |       |
|  rawDataSize               |-1                                                       |       |
|  numFiles                  |1                                                        |       |
|  transient_lastDdlTime     |1462875634                                               |       |
|  totalSize                 |684                                                      |       |
|  spark.sql.sources.provider|parquet                                                  |       |
|  EXTERNAL                  |FALSE                                                    |       |
|  COLUMN_STATS_ACCURATE     |false                                                    |       |
|  numRows                   |-1                                                       |       |
|                            |                                                         |       |
|# Storage Information       |                                                         |       |
|SerDe Library:              |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       |       |
|InputFormat:                |org.apache.hadoop.mapred.SequenceFileInputFormat         |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat|       |
|Compressed:                 |No                                                       |       |
|Num Buckets:                |2                                                        |       |
|Bucket Columns:             |[b]                                                      |       |
|Sort Columns:               |[c]                                                      |       |
|Storage Desc Parameters:    |                                                         |       |
|  path                      |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|  serialization.format      |1                                                        |       |
+----------------------------+---------------------------------------------------------+-------+
```

## How was this patch tested?

Test cases are added in `HiveDDLSuite` to check command output.

Author: Cheng Lian <[email protected]>

Closes #13025 from liancheng/spark-14127-extract-schema-info.
describe(relation, buffer)

append(buffer, "", "", "")
append(buffer, "# Detailed Table Information", relation.catalogTable.toString, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng To improve the output of Explain, I plan to change the default implementation of toString of case class CatalogTable. That will also affect the output of Describe Extended.

I checked what Hive did for the command Describe Extended, as follows.

Detailed Table Information Table(tableName:t1, dbName:default, owner:root, createTime:1462627092, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://6b68a24121f4:9000/user/hive/warehouse/t1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1462627092}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

Basically, in the implementation of toString, I will try to follow what you did in describeFormatted but the contents will be in a single line. Feel free to let me know if you have any concern or suggestion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants