[SPARK-22712][SQL] Use `buildReaderWithPartitionValues` in native OrcFileFormat by dongjoon-hyun · Pull Request #19907 · apache/spark

dongjoon-hyun · 2017-12-06T08:48:50Z

What changes were proposed in this pull request?

To support vectorization in native OrcFileFormat later, we need to use buildReaderWithPartitionValues instead of buildReader like ParquetFileFormat. This PR replaces buildReader with buildReaderWithPartitionValues.

How was this patch tested?

Pass the Jenkins with the existing test cases.

…FileFormat

dongjoon-hyun · 2017-12-06T08:51:58Z

  }

-  override def buildReader(
+  override def buildReaderWithPartitionValues(


Hi, @cloud-fan . During the previous ORC PR, we left this behind.

SparkQA · 2017-12-06T11:25:47Z

Test build #84543 has finished for PR 19907 at commit 199c835.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-06T11:41:01Z

-        val unsafeProjection = UnsafeProjection.create(requiredSchema)
-        val deserializer = new OrcDeserializer(dataSchema, requiredSchema, requestedColIds)
+        val colIds = requestedColIds ++ List.fill(partitionSchema.length)(-1).toArray[Int]
+        val unsafeProjection = UnsafeProjection.create(resultSchema)


can we follow parquet and just join the data row and partition row, and do a final unsafe projection? It's much easier and there is no performance difference.

Parquet Vectorization work like the following.

// UnsafeRowParquetRecordReader appends the columns internally to avoid another copy. if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] && enableVectorizedReader) { iter.asInstanceOf[Iterator[InternalRow]]

Oh, I see. you meant non-vectorized path. Sorry, I was confused since I focused too much on vectorized path. I'll do.

SparkQA · 2017-12-06T21:13:42Z

Test build #84570 has finished for PR 19907 at commit f69fc4e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-12-07T00:47:51Z

Hi, @gatorsmile .
Could you review this please, too?

cloud-fan · 2017-12-07T13:08:39Z

thanks, merging to master!

dongjoon-hyun · 2017-12-07T15:19:37Z

Thank you, @cloud-fan !

[SPARK-22712][SQL] Use buildReaderWithPartitionValues in native Orc…

199c835

…FileFormat

dongjoon-hyun commented Dec 6, 2017

View reviewed changes

cloud-fan reviewed Dec 6, 2017

View reviewed changes

Use joinedRow.

f69fc4e

asfgit closed this in dd59a4b Dec 7, 2017

dongjoon-hyun deleted the SPARK-ORC-BUILD-READER branch December 7, 2017 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22712][SQL] Use `buildReaderWithPartitionValues` in native OrcFileFormat#19907

[SPARK-22712][SQL] Use `buildReaderWithPartitionValues` in native OrcFileFormat#19907
dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
dongjoon-hyun:SPARK-ORC-BUILD-READER

dongjoon-hyun commented Dec 6, 2017

Uh oh!

dongjoon-hyun Dec 6, 2017

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

cloud-fan Dec 6, 2017

Uh oh!

dongjoon-hyun Dec 6, 2017

Uh oh!

dongjoon-hyun Dec 6, 2017

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

dongjoon-hyun commented Dec 7, 2017 •

edited

Loading

Uh oh!

cloud-fan commented Dec 7, 2017

Uh oh!

dongjoon-hyun commented Dec 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dongjoon-hyun commented Dec 6, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

cloud-fan Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2017

Uh oh!

dongjoon-hyun commented Dec 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Dec 7, 2017

Uh oh!

dongjoon-hyun commented Dec 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Dec 7, 2017 •

edited

Loading