Skip to content

Conversation

@rajeshbalamohan
Copy link

What changes were proposed in this pull request?

This PR improves ORCFileFormat to handle cases when schema stored in the ORC file does not match the schema stored in metastore.

ORC Data written by Hive-1.x had virtual column names (HIVE-4243). This is fixed in Hive-2.x, but for data stored using Hive-1.x spark would throw exceptions. To mitigate this, "spark.sql.hve.convertMetastoreOrc" was disabled via SPARK-15705. However, that would incur
performance penalties as it would go via HiveTableScan and HadoopRDD. This PR fixes this issue.

Related tickets:
SPARK-15705 : Change the default value of spark.sql.hive.convertMetastoreOrc to false.
SPARK-15705 : Spark won't read ORC schema from metastore for partitioned tables
SPARK-16628 : OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

How was this patch tested?

Manual testing by setting "spark.sql.hve.convertMetastoreOrc=true" and querying data stored via Hive-1.x in ORC format. Also ran unit-tests related to sql.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63147 has finished for PR 14471 at commit dc943a4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rajeshbalamohan
Copy link
Author

Fixed scalastyle issues

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63150 has finished for PR 14471 at commit 58162b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 3, 2016

Can you add a test case?

@rxin
Copy link
Contributor

rxin commented Aug 3, 2016

also can you update the title? The current title is very generic. This ticket seems to be solving a specific problem.

@rajeshbalamohan rajeshbalamohan changed the title [SPARK-14387][SQL] Exceptions thrown when querying ORC tables [SPARK-14387][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc Aug 3, 2016
@rajeshbalamohan
Copy link
Author

rajeshbalamohan commented Aug 3, 2016

Thanks @rxin.

Changes:

  1. Added test case. Also added sample orc file (392 bytes) from Hive 1.x with format "Type: struct<_col0:int,_col1:string>". Without this PR change in OrcFileFormat, it would end up throwing "java.lang.IllegalArgumentException: Field "key" does not exist." for the same test case.
  2. Fixed the title of the JIRA and the PR.

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63169 has finished for PR 14471 at commit 046e0c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65687 has finished for PR 14471 at commit 046e0c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @rajeshbalamohan . I'll refer your commit for SPARK-19459 . You'll be the main author in case of merge.

@dongjoon-hyun
Copy link
Member

Hi, @rajeshbalamohan .
SPARK-14387 is resolved now. Could you close this PR?

@asfgit asfgit closed this in ed1478c Nov 7, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#11494
Closes apache#14158
Closes apache#16803
Closes apache#16864
Closes apache#17455
Closes apache#17936
Closes apache#19377

Added:
Closes apache#19380
Closes apache#18642
Closes apache#18377
Closes apache#19632

Added:
Closes apache#14471
Closes apache#17402
Closes apache#17953
Closes apache#18607

Also cc srowen vanzin HyukjinKwon gatorsmile cloud-fan to see if you have other PRs to close.

Author: Xingbo Jiang <[email protected]>

Closes apache#19669 from jiangxb1987/stale-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants