[SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState #12993

gatorsmile · 2016-05-09T01:02:08Z

What changes were proposed in this pull request?

In a HiveSessionState, which is a given SparkSession backed by Hive, the analysis should not be case sensitive because the underlying Hive Metastore is case insensitive.

For example,

CREATE TABLE tab1 (C1 int);
SELECT C1 FROM tab1

In the current implementation, we will get the following error because the column name is always stored in lower case.

cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7
org.apache.spark.sql.AnalysisException: cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7

This PR is to always use case insensitive analysis in HiveSessionState, no matter whether users set spark.sql.caseSensitive to true or false.

How was this patch tested?

Added the related test cases.

SparkQA · 2016-05-09T02:12:26Z

Test build #58114 has finished for PR 12993 at commit d7d96c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-05-09T03:42:59Z

cc @cloud-fan @rxin @yhuai @andrewor14

cloud-fan · 2016-05-09T04:31:20Z

I think we need to discuss it more:

should we allow the case sensitivity to be configurable? It's sometimes out of our control like hive catalog, which is always case insensitive
except case sensitivity, should we also include the concept of case-preserving for external catalog?

gatorsmile · 2016-05-09T05:16:49Z

Agree. We need to be careful for deciding the design. This PR is just to recover our previous behavior in HiveContext.

Regarding case sensitivity, it is complicated and platform/vender-specific. Below is based on my search. It might not be 100% correct.

For the un-quoted identifiers, the SQL2003 compliance and DB2 is No. Oracle and SQL Server are configurable, but the default is No.
For the quoted/delimited identifiers, most traditional RDBMS are case sensitive. Hive is special. Starting from Hive 1.3, Hive supports quoted identifiers in Column names. https://issues.apache.org/jira/browse/HIVE-6013 However, this is not applicable to the Table/Database/Function names in Hive.

rxin · 2016-05-09T18:56:01Z

We want to eliminate HiveSessionState, so this is going a step back, and this is taking another step back in diverging the behavior of the Hive one and non-Hive one.

I don't think we should support this, and for now just make case sensitivity an internal config and not exposed to user. Our case sensitivity support is somewhat broken and does not follow sql standard (e.g. in postgres quoting something makes them case sensitive), so the simplest solution is to not support it for now and

See https://issues.apache.org/jira/browse/SPARK-15229

gatorsmile · 2016-05-09T20:02:32Z

Agree. Let me close this now. Thanks!

case insensitive in Hive

d7d96c3

gatorsmile mentioned this pull request May 9, 2016

[SPARK-15187] [SQL] Disallow Dropping Default Database #12962

Closed

gatorsmile closed this May 9, 2016

gatorsmile mentioned this pull request Jun 20, 2016

[SPARK-16049][SQL] Make InsertIntoTable's expectedColumns support case-insensitive resolution properly #13772

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState #12993

[SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState #12993

Uh oh!

gatorsmile commented May 9, 2016

Uh oh!

SparkQA commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016

Uh oh!

cloud-fan commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016 •

edited

Loading

Uh oh!

rxin commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState #12993

[SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState #12993

Uh oh!

Conversation

gatorsmile commented May 9, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016

Uh oh!

cloud-fan commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented May 9, 2016

Uh oh!

gatorsmile commented May 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gatorsmile commented May 9, 2016 •

edited

Loading