[SPARK-15275] [SQL] CatalogTable should store sort ordering for sorted columns #13059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Jira link : https://issues.apache.org/jira/browse/SPARK-15275
For bucketed tables in Hive, one can also add constraint about column sortedness along with ordering.
As per the spec in [0], CREATE TABLE statement can allow SORT ordering as well:
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[0] : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
Currently CatalogTable does not store any information about the sort ordering and just has the names of the sorted columns. This PR adds
CatalogSortOrderto hold the sorted column name and the sorted order. Currently this information is not used in query execution but can be used as more support for bucketing is added. Possible advantage is ability to skip rows while performing predicate matching.How was this patch tested?
Currently trunk does support creating bucketed hive tables. I am relying on existing tests.