Commit 2ec7d7a
[SPARK-2443][SQL] Fix slow read from partitioned tables
This fix obtains a comparable performance boost as [PR #1390](#1390) by moving an array update and deserializer initialization out of a potentially very long loop. Suggested by yhuai. The below results are updated for this fix.
## Benchmarks
Generated a local text file with 10M rows of simple key-value pairs. The data is loaded as a table through Hive. Results are obtained on my local machine using hive/console.
Without the fix:
Type | Non-partitioned | Partitioned (1 part)
------------ | ------------ | -------------
First run | 9.52s end-to-end (1.64s Spark job) | 36.6s (28.3s)
Stablized runs | 1.21s (1.18s) | 27.6s (27.5s)
With this fix:
Type | Non-partitioned | Partitioned (1 part)
------------ | ------------ | -------------
First run | 9.57s (1.46s) | 11.0s (1.69s)
Stablized runs | 1.13s (1.10s) | 1.23s (1.19s)
Author: Zongheng Yang <[email protected]>
Closes #1408 from concretevitamin/slow-read-2 and squashes the following commits:
d86e437 [Zongheng Yang] Move update & initialization out of potentially long loop.
(cherry picked from commit d60b09b)
Signed-off-by: Michael Armbrust <[email protected]>1 parent baf92a0 commit 2ec7d7a
File tree
1 file changed
+7
-3
lines changed- sql/hive/src/main/scala/org/apache/spark/sql/hive
1 file changed
+7
-3
lines changedLines changed: 7 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
167 | 174 | | |
168 | 175 | | |
169 | | - | |
170 | | - | |
171 | 176 | | |
172 | 177 | | |
173 | | - | |
174 | 178 | | |
175 | 179 | | |
176 | 180 | | |
| |||
0 commit comments