[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059

tejasapatil · 2016-05-11T21:07:31Z

What changes were proposed in this pull request?

Jira link : https://issues.apache.org/jira/browse/SPARK-15275

For bucketed tables in Hive, one can also add constraint about column sortedness along with ordering.
As per the spec in [0], CREATE TABLE statement can allow SORT ordering as well:

[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

[0] : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable

Currently CatalogTable does not store any information about the sort ordering and just has the names of the sorted columns. This PR adds CatalogSortOrder to hold the sorted column name and the sorted order. Currently this information is not used in query execution but can be used as more support for bucketing is added. Possible advantage is ability to skip rows while performing predicate matching.

How was this patch tested?

Currently trunk does support creating bucketed hive tables. I am relying on existing tests.

SparkQA · 2016-05-11T21:45:44Z

Test build #58408 has finished for PR 13059 at commit c2f81ca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class CatalogSortOrder(

tejasapatil · 2016-05-12T22:16:36Z

Came across #12759 and realised that DESC ordering is not supported inherently.

rxin · 2016-06-01T07:18:10Z

Is it actually useful to support desc order?

tejasapatil · 2016-06-01T15:24:45Z

I initially did not knew that DESC order is inherently not supported in Spark so had worked on this PR. One could add that in the engine but thats not my priority right now. I am working on improving #13231 which will have some parts of this PR. Closing this PR.

tejasapatil added 2 commits May 11, 2016 12:23

Metastore relation to save table bucketing information

902dd8e

CatalogTable should include sorted columns ordering information

c2f81ca

tejasapatil closed this Jun 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059

[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059

Uh oh!

tejasapatil commented May 11, 2016

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

tejasapatil commented May 12, 2016

Uh oh!

rxin commented Jun 1, 2016

Uh oh!

tejasapatil commented Jun 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059

[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059

Uh oh!

Conversation

tejasapatil commented May 11, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 11, 2016

Uh oh!

tejasapatil commented May 12, 2016

Uh oh!

rxin commented Jun 1, 2016

Uh oh!

tejasapatil commented Jun 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants