[SPARK-14127][SQL][WIP] Describe table #12460

dilipbiswal · 2016-04-18T02:42:12Z

What changes were proposed in this pull request?

This PR adds .support for describing partitions and columns. Support for describing
tables were already in place. The PR moves the code to SessionCatalog/HiveSessionCatalog.

Command Syntax:

DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] [PARTITION partition_spec]

How was this patch tested?

Added test cases to DDLCommandSuite to verify the plan. Added some error tests
to HiveCommandSuite. The rest of the coverage should be from existing test cases.

dilipbiswal · 2016-04-18T02:47:38Z

@andrewor14 Looking for some early feedback on this as i was thinking to do the same for show table extended. I did have a brief discussion with @gatorsmile on this.

gatorsmile · 2016-04-18T02:48:39Z

Please resolve the conflicts. : )

dilipbiswal · 2016-04-18T03:13:36Z

@gatorsmile Thank you. I have resolved the conflicts.

gatorsmile · 2016-04-18T05:06:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

cc @hvanhovell

It is a bit more complicates than I thought. We allow strings here because Hive allows us to use the '$elem', '$keys' and '$values' 'keywords'. That is why I added strings to the rule. I am not sure if we should support this. What do you guys think?

This is what I found in the Hive manual:

DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name[ col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];

See also: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe

Yeah Herman. Not supporting it would certainly simplify things. FYI - I checked that the unit test case describe_xpath.q which exercises this syntax is not getting tested in HiveCompatibleSuite.

Ok, lets remove this from the grammar as well, and just use a dot separated list of identifiers. Actually, are we currently able to deal with nested columns?

@hvanhovell Hi Herman, I tried very simple scenarios of using nested columns and it seems to work ok. Let me paste the output here.

map

create table mp_t1 (a map <int, string>, b string) row format delimited collection items terminated by '$' map keys terminated by '#'; load data local inpath '/data/mapfile' overwrite into table mp_t1; select * from mp_t1; a b {100:"spark"} ABC describe extended mp_t1.a.$key$; Result ====== $key$ int from deserializer

Struct

create table ct_t (a struct<n1: string, n2: string>, b string) stored as textfile; insert into ct_t values (('abc', 'efg'), 'ABC'); spark-sql> select * from ct_t; {"n1":"abb","n2":"efg"} ABC spark-sql> describe extended ct_t.a.n1; OK n1 string from deserializer

Herman, based on hive syntax diagram, i was expecting the following command to work.
describe extended mp_t1.a.'$key$';
However, i get a parse exception and when i remove the quotes it works like following.
describe extended mp_t1.a.$key$

Given this, what kind of changes we need to make to the grammar if we need to support this ? Please let me know your thoughts.

@hvanhovell Let me work on the grammar change. I will introduce a rule colPathIdentifier which is basically a regular identifier or the set of key, value, elem keywords.

@dilipbiswal Do you plan on supporting the key/value/elem keywords and nested elements? Which would be cool.

@hvanhovell Yeah. I have attempted to support the key/value/elem keywords. Could you please check to see if there are any issues ? I am also trying to test this a bit more in parallel.

SparkQA · 2016-04-18T19:33:52Z

Test build #56071 has finished for PR 12460 at commit cfb0eeb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-20T08:52:58Z

Test build #56333 has finished for PR 12460 at commit 98df1d8.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-04-20T09:28:26Z

Test build #56335 has finished for PR 12460 at commit 868b438.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-04-20T18:01:29Z

Test build #56371 has finished for PR 12460 at commit 41cf12d.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

dilipbiswal · 2016-04-20T18:53:03Z

rebased..

SparkQA · 2016-04-20T20:38:59Z

Test build #56394 has finished for PR 12460 at commit eb1c30e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2016-04-26T19:08:37Z

@liancheng Hi Lian, can you please look over this PR and give some comments. Thanks !!

SparkQA · 2016-04-27T01:57:57Z

Test build #57062 has finished for PR 12460 at commit 83c2875.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-27T09:51:58Z

Test build #57106 has finished for PR 12460 at commit 34f6d32.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-04-27T15:14:29Z

@dilipbiswal One purpose of re-implementing all DDL as native Spark SQL command is to minimize dependency to Hive so that we can move Hive into a separate data source some day. That said, we really don't want to make these new DDL commands rely on classes like HiveClient, HiveClientImpl, or HiveSessionCatalog. When you need to access Hive table metadata, you should access them via CatalogTable rather than depending on any Hive data structure.

dilipbiswal · 2016-04-27T19:07:45Z

@liancheng Thank you for your comment. Actually initially i started with the idea of serving the describe command solely from CatalogTable. I then realized that CatalogTable may not have all the metadata information that is required for this command. So i have a couple of high level questions:

Can we add more fields to CatalogTable ?
- Some example of fields that miss are retention, privileges.
- When we choose "describe extended partition", quite a few details that are readily available in HivePartition is not present in our CatalogTablePartition object.
- Another use case is "describe table column_path". This is served by a call to Hive's deserializer via. Hive.getFieldsFromDeserializer
Do we have flexibility on the output of describe command or we need match hive's output completely ? If so, we can remove the describe-related tests from HiveCompatibiltySuite and add suitable tests in SQLQuerySuite.

SparkQA · 2016-04-30T05:16:56Z

Test build #57400 has finished for PR 12460 at commit 319d45b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-05-04T15:02:38Z

It looks like this can be closed because #12844 was merged

dilipbiswal · 2016-05-05T08:13:26Z

@liancheng Hi Lian, in this PR, i had implemented "describe table partition" and "describe column".
Do you want me to put this on top of your describe table changes ? Let me know please. If you plan to work on it then let me know.

@viirya - fyi.

dilipbiswal changed the title ~~[SPARK-14127] Describe table~~ [SPARK-14127][SQL][WIP] Describe table Apr 18, 2016

dilipbiswal force-pushed the dkb_desc_tbl branch from 5b349da to cfb0eeb Compare April 18, 2016 03:12

gatorsmile reviewed Apr 18, 2016
View reviewed changes

dilipbiswal force-pushed the dkb_desc_tbl branch from 41cf12d to eb1c30e Compare April 20, 2016 18:52

dilipbiswal force-pushed the dkb_desc_tbl branch from eb1c30e to 83c2875 Compare April 27, 2016 00:26

dilipbiswal force-pushed the dkb_desc_tbl branch from 83c2875 to 34f6d32 Compare April 27, 2016 08:18

dilipbiswal added 7 commits April 29, 2016 18:28

[SPARK-14127] Describe table

f7bb700

complex type

4bfa25b

one more test

e9ca6a5

fix

8386f29

style

be6c9ba

fix after rebase

30efbce

DescribeTable based on CatalogTable

319d45b

dilipbiswal force-pushed the dkb_desc_tbl branch from 34f6d32 to 319d45b Compare April 30, 2016 03:42

dilipbiswal closed this May 4, 2016

[SPARK-14127][SQL][WIP] Describe table #12460

[SPARK-14127][SQL][WIP] Describe table #12460

Uh oh!

Conversation

dilipbiswal commented Apr 18, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dilipbiswal commented Apr 18, 2016

Uh oh!

gatorsmile commented Apr 18, 2016

Uh oh!

dilipbiswal commented Apr 18, 2016

Uh oh!

gatorsmile Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

map

Struct

Uh oh!

dilipbiswal Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

dilipbiswal commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

dilipbiswal commented Apr 26, 2016

Uh oh!

SparkQA commented Apr 27, 2016

Uh oh!

SparkQA commented Apr 27, 2016

Uh oh!

liancheng commented Apr 27, 2016

Uh oh!

dilipbiswal commented Apr 27, 2016

Uh oh!

SparkQA commented Apr 30, 2016

Uh oh!

srowen commented May 4, 2016

Uh oh!

dilipbiswal commented May 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dilipbiswal Apr 20, 2016 •

edited

Loading