Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

hive> create table t1(`a,` string);
OK
Time taken: 1.399 seconds

hive> create table t2(`a,` string, b string);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 3 elements while columns.types has 2 elements!)

hive> create table t2(`a,` string, b string) stored as parquet;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: ParquetHiveSerde initialization failed. Number of column name and column type differs. columnNames = [a, , b], columnTypes = [string, string]

It has a bug in Hive metastore.

When users do not provide alias name in the SELECT query, we call toPrettySQL to generate the alias name. For example, the string get_json_object(jstring, '$.f1') will be the alias name for the function call in the statement

SELECT key, get_json_object(jstring, '$.f1') FROM tempView

Above is not an issue for the SELECT query statements. However, for CTAS, we hit the issue due to a bug in Hive metastore. Hive metastore does not like the column names containing commas and returned a confusing error message, like:

17/04/26 23:12:56 ERROR [hive.log(397) -- main]: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 2 elements while columns.types has 1 elements!
org.apache.hadoop.hive.serde2.SerDeException: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 2 elements while columns.types has 1 elements!

Thus, this PR is to block users to create a table in Hive metastore when the table table has a column containing commas in the name.

How was this patch tested?

Added a test case

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76217 has started for PR 17781 at commit 9563de4.

@gatorsmile
Copy link
Member Author

ok to test

@gatorsmile
Copy link
Member Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76238 has finished for PR 17781 at commit 9563de4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Creating views in Hive does not have such an issue.

hive> create table tab2 (a string, b string);
OK
Time taken: 0.807 seconds
hive> create view view2 (`a,`, b) as SELECT a, b from tab2;
OK
Time taken: 0.727 seconds

@gatorsmile
Copy link
Member Author

hive> create table partTab (a string, b string) PARTITIONED BY (`a,` string, `b,` string);
OK

It is OK to use commas in partition column names

@SparkQA
Copy link

SparkQA commented Apr 28, 2017

Test build #76252 has finished for PR 17781 at commit 7839a1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @sameeragarwal

@cloud-fan
Copy link
Contributor

LGTM, merging to master/2.2

asfgit pushed a commit that referenced this pull request Apr 28, 2017
…he column names

### What changes were proposed in this pull request?
```SQL
hive> create table t1(`a,` string);
OK
Time taken: 1.399 seconds

hive> create table t2(`a,` string, b string);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 3 elements while columns.types has 2 elements!)

hive> create table t2(`a,` string, b string) stored as parquet;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: ParquetHiveSerde initialization failed. Number of column name and column type differs. columnNames = [a, , b], columnTypes = [string, string]
```
It has a bug in Hive metastore.

When users do not provide alias name in the SELECT query, we call `toPrettySQL` to generate the alias name. For example, the string `get_json_object(jstring, '$.f1')` will be the alias name for the function call in the statement
```SQL
SELECT key, get_json_object(jstring, '$.f1') FROM tempView
```
Above is not an issue for the SELECT query statements. However, for CTAS, we hit the issue due to a bug in Hive metastore. Hive metastore does not like the column names containing commas and returned a confusing error message, like:
```
17/04/26 23:12:56 ERROR [hive.log(397) -- main]: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 2 elements while columns.types has 1 elements!
org.apache.hadoop.hive.serde2.SerDeException: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 2 elements while columns.types has 1 elements!
```

Thus, this PR is to block users to create a table in Hive metastore when the table table has a column containing commas in the name.

### How was this patch tested?
Added a test case

Author: Xiao Li <[email protected]>

Closes #17781 from gatorsmile/blockIllegalColumnNames.

(cherry picked from commit e3c8160)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in e3c8160 Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants