Skip to content
This repository was archived by the owner on Nov 15, 2024. It is now read-only.

Commit cde5f95

Browse files
bersprocketsMatthewRBruce
authored andcommitted
[SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table
## What changes were proposed in this pull request? TableReader would get disproportionately slower as the number of columns in the query increased. I fixed the way TableReader was looking up metadata for each column in the row. Previously, it had been looking up this data in linked lists, accessing each linked list by an index (column number). Now it looks up this data in arrays, where indexing by column number works better. ## How was this patch tested? Manual testing All sbt unit tests python sql tests Author: Bruce Robbins <[email protected]> Closes apache#21043 from bersprockets/tabreadfix.
1 parent 36f747b commit cde5f95

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -381,7 +381,7 @@ private[hive] object HadoopTableReader extends HiveInspectors with Logging {
381381

382382
val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) =>
383383
soi.getStructFieldRef(attr.name) -> ordinal
384-
}.unzip
384+
}.toArray.unzip
385385

386386
/**
387387
* Builds specific unwrappers ahead of time according to object inspector

0 commit comments

Comments
 (0)