Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds .support for describing partitions and columns. Support for describing
tables were already in place. The PR moves the code to SessionCatalog/HiveSessionCatalog.

Command Syntax:

DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] [PARTITION partition_spec]

How was this patch tested?

Added test cases to DDLCommandSuite to verify the plan. Added some error tests
to HiveCommandSuite. The rest of the coverage should be from existing test cases.

@dilipbiswal dilipbiswal changed the title [SPARK-14127] Describe table [SPARK-14127][SQL][WIP] Describe table Apr 18, 2016
@dilipbiswal
Copy link
Contributor Author

@andrewor14 Looking for some early feedback on this as i was thinking to do the same for show table extended. I did have a brief discussion with @gatorsmile on this.

@gatorsmile
Copy link
Member

Please resolve the conflicts. : )

@dilipbiswal
Copy link
Contributor Author

@gatorsmile Thank you. I have resolved the conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit more complicates than I thought. We allow strings here because Hive allows us to use the '$elem', '$keys' and '$values' 'keywords'. That is why I added strings to the rule. I am not sure if we should support this. What do you guys think?

This is what I found in the Hive manual:

DESCRIBE [EXTENDED|FORMATTED]  [db_name.]table_name[ col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];

See also: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Herman. Not supporting it would certainly simplify things. FYI - I checked that the unit test case describe_xpath.q which exercises this syntax is not getting tested in HiveCompatibleSuite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, lets remove this from the grammar as well, and just use a dot separated list of identifiers. Actually, are we currently able to deal with nested columns?

Copy link
Contributor Author

@dilipbiswal dilipbiswal Apr 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Hi Herman, I tried very simple scenarios of using nested columns and it seems to work ok. Let me paste the output here.

map

create table mp_t1 (a map <int, string>, b string) row format delimited collection items terminated by '$' map keys terminated by '#';
load data local inpath '/data/mapfile' overwrite into table mp_t1; 
select * from mp_t1;
a                     b
{100:"spark"}   ABC

describe extended mp_t1.a.$key$;
Result
======
$key$                   int                     from deserializer

Struct

create table ct_t (a struct<n1: string, n2: string>, b string) stored as textfile; 
insert into ct_t values (('abc', 'efg'), 'ABC'); 
spark-sql> select * from ct_t;
{"n1":"abb","n2":"efg"} ABC

spark-sql> describe extended ct_t.a.n1;
OK
n1                      string                  from deserializer  

Herman, based on hive syntax diagram, i was expecting the following command to work.
describe extended mp_t1.a.'$key$';
However, i get a parse exception and when i remove the quotes it works like following.
describe extended mp_t1.a.$key$

Given this, what kind of changes we need to make to the grammar if we need to support this ? Please let me know your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Let me work on the grammar change. I will introduce a rule colPathIdentifier which is basically a regular identifier or the set of key, value, elem keywords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dilipbiswal Do you plan on supporting the key/value/elem keywords and nested elements? Which would be cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Yeah. I have attempted to support the key/value/elem keywords. Could you please check to see if there are any issues ? I am also trying to test this a bit more in parallel.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56071 has finished for PR 12460 at commit cfb0eeb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56333 has finished for PR 12460 at commit 98df1d8.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56335 has finished for PR 12460 at commit 868b438.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56371 has finished for PR 12460 at commit 41cf12d.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

rebased..

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56394 has finished for PR 12460 at commit eb1c30e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@liancheng Hi Lian, can you please look over this PR and give some comments. Thanks !!

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57062 has finished for PR 12460 at commit 83c2875.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57106 has finished for PR 12460 at commit 34f6d32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@dilipbiswal One purpose of re-implementing all DDL as native Spark SQL command is to minimize dependency to Hive so that we can move Hive into a separate data source some day. That said, we really don't want to make these new DDL commands rely on classes like HiveClient, HiveClientImpl, or HiveSessionCatalog. When you need to access Hive table metadata, you should access them via CatalogTable rather than depending on any Hive data structure.

@dilipbiswal
Copy link
Contributor Author

@liancheng Thank you for your comment. Actually initially i started with the idea of serving the describe command solely from CatalogTable. I then realized that CatalogTable may not have all the metadata information that is required for this command. So i have a couple of high level questions:

  1. Can we add more fields to CatalogTable ?
    • Some example of fields that miss are retention, privileges.
    • When we choose "describe extended partition", quite a few details that are readily available in HivePartition is not present in our CatalogTablePartition object.
    • Another use case is "describe table column_path". This is served by a call to Hive's deserializer via. Hive.getFieldsFromDeserializer
  2. Do we have flexibility on the output of describe command or we need match hive's output completely ? If so, we can remove the describe-related tests from HiveCompatibiltySuite and add suitable tests in SQLQuerySuite.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57400 has finished for PR 12460 at commit 319d45b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 4, 2016

It looks like this can be closed because #12844 was merged

@dilipbiswal dilipbiswal closed this May 4, 2016
@dilipbiswal
Copy link
Contributor Author

@liancheng Hi Lian, in this PR, i had implemented "describe table partition" and "describe column".
Do you want me to put this on top of your describe table changes ? Let me know please. If you plan to work on it then let me know.

@viirya - fyi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants