[SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail #17791

umehrot2 · 2017-04-27T22:25:55Z

What changes were proposed in this pull request?

Reading from a Hive ORC table containing char/varchar columns fails in Spark SQL. This is caused by the fact that Spark SQL internally replaces the char/varchar columns with String data type. So, while reading from the table created in Hive which has varchar/char columns, it ends up using the wrong reader and causes a ClassCastException.

This patch allows Spark SQL to interpret varchar/char columns correctly, and store them as varchar/char type instead of internally converting to string columns.

How was this patch tested?

-> Added Unit tests
-> Manually tested on AWS EMR cluster

Step 1:
Created a table using hive (having varchar/char columns), and inserted some data:

CREATE EXTERNAL TABLE IF NOT EXISTS hive_orc_test (
a VARCHAR(10),
b CHAR(10),
c BIGINT)
STORED AS ORC
LOCATION 's3://xxxx';

INSERT INTO TABLE hive_orc_test VALUES ('abc', 'A', 101), ('abc1', 'B', 102), ('abc3', 'C', 103);

Step 2:
Created an external table in Spark SQL using the same source location, and run a select query on that.

CREATE EXTERNAL TABLE IF NOT EXISTS spark_orc_test (
a VARCHAR(10),
b CHAR(10),
c BIGINT)
STORED AS ORC
LOCATION 's3://xxxx';

SELECT * form spark_orc_test;

Result:
17/02/24 23:22:57 INFO DAGScheduler: Job 1 finished: processCmd at CliDriver.java:376, took 2.673360 s
abc A 101
abc1 B 102
abc3 C 103
Time taken: 4.327 seconds, Fetched 3 row(s)

…should not fail

AmplabJenkins · 2017-04-27T22:27:15Z

Can one of the admins verify this patch?

mridulm · 2017-04-27T23:15:21Z

+CC @dongjoon-hyun - since you were looking at ORC.

hvanhovell · 2017-04-27T23:21:04Z

This is very similar to #16804 however that approach is like this one is slightly broken (because it does not support nested char/varchar columns), can you just backport #17030 which is an improved version.

dongjoon-hyun · 2017-04-27T23:24:50Z

Thank you for pining me, @mridulm . :)

gatorsmile · 2017-04-27T23:30:09Z

BTW, please add [BACKPORT-2.0] in your PR title.

HyukjinKwon · 2017-06-02T12:48:54Z

ping @umehrot2

# What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with apache#18017 Closes apache#11459 Closes apache#13833 Closes apache#13720 Closes apache#12506 Closes apache#12456 Closes apache#12252 Closes apache#17689 Closes apache#17791 Closes apache#18163 Closes apache#17640 Closes apache#17926 Closes apache#18163 Closes apache#12506 Closes apache#18044 Closes apache#14036 Closes apache#15831 Closes apache#14461 Closes apache#17638 Closes apache#18222 Added: Closes apache#18045 Closes apache#18061 Closes apache#18010 Closes apache#18041 Closes apache#18124 Closes apache#18130 Closes apache#12217 Added: Closes apache#16291 Closes apache#17480 Closes apache#14995 Added: Closes apache#12835 Closes apache#17141 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18223 from HyukjinKwon/close-stale-prs.

Fix reading of HIVE ORC table with varchar/char columns in Spark SQL …

522cd75

…should not fail

umehrot2 changed the title ~~Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail~~ [SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail Apr 27, 2017

HyukjinKwon mentioned this pull request Jun 7, 2017

[INFRA] Close stale PRs #18223

Closed

asfgit closed this in b771fed Jun 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail #17791

[SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail #17791

Uh oh!

umehrot2 commented Apr 27, 2017

Uh oh!

AmplabJenkins commented Apr 27, 2017

Uh oh!

mridulm commented Apr 27, 2017

Uh oh!

hvanhovell commented Apr 27, 2017 •

edited

Loading

Uh oh!

dongjoon-hyun commented Apr 27, 2017

Uh oh!

gatorsmile commented Apr 27, 2017

Uh oh!

HyukjinKwon commented Jun 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail #17791

[SPARK-20515][SQL] Fix reading of HIVE ORC table with varchar/char columns in Spark SQL should not fail #17791

Uh oh!

Conversation

umehrot2 commented Apr 27, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Apr 27, 2017

Uh oh!

mridulm commented Apr 27, 2017

Uh oh!

hvanhovell commented Apr 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 27, 2017

Uh oh!

gatorsmile commented Apr 27, 2017

Uh oh!

HyukjinKwon commented Jun 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hvanhovell commented Apr 27, 2017 •

edited

Loading