[SPARK-22712][SQL] Use buildReaderWithPartitionValues in native OrcFileFormat#19907
[SPARK-22712][SQL] Use buildReaderWithPartitionValues in native OrcFileFormat#19907dongjoon-hyun wants to merge 2 commits intoapache:masterfrom dongjoon-hyun:SPARK-ORC-BUILD-READER
buildReaderWithPartitionValues in native OrcFileFormat#19907Conversation
| } | ||
|
|
||
| override def buildReader( | ||
| override def buildReaderWithPartitionValues( |
There was a problem hiding this comment.
Hi, @cloud-fan . During the previous ORC PR, we left this behind.
|
Test build #84543 has finished for PR 19907 at commit
|
| val unsafeProjection = UnsafeProjection.create(requiredSchema) | ||
| val deserializer = new OrcDeserializer(dataSchema, requiredSchema, requestedColIds) | ||
| val colIds = requestedColIds ++ List.fill(partitionSchema.length)(-1).toArray[Int] | ||
| val unsafeProjection = UnsafeProjection.create(resultSchema) |
There was a problem hiding this comment.
can we follow parquet and just join the data row and partition row, and do a final unsafe projection? It's much easier and there is no performance difference.
There was a problem hiding this comment.
Parquet Vectorization work like the following.
// UnsafeRowParquetRecordReader appends the columns internally to avoid another copy.
if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] &&
enableVectorizedReader) {
iter.asInstanceOf[Iterator[InternalRow]]
There was a problem hiding this comment.
Oh, I see. you meant non-vectorized path. Sorry, I was confused since I focused too much on vectorized path. I'll do.
|
Test build #84570 has finished for PR 19907 at commit
|
|
Hi, @gatorsmile . |
|
thanks, merging to master! |
|
Thank you, @cloud-fan ! |
What changes were proposed in this pull request?
To support vectorization in native OrcFileFormat later, we need to use
buildReaderWithPartitionValuesinstead ofbuildReaderlike ParquetFileFormat. This PR replacesbuildReaderwithbuildReaderWithPartitionValues.How was this patch tested?
Pass the Jenkins with the existing test cases.