Commit b45c13e
committed
SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys
The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. This PR fixes it by not reading in that next key, using a peeking iterator instead.
Author: Matei Zaharia <[email protected]>
Closes apache#986 from mateiz/spark-2043 and squashes the following commits:
0959514 [Matei Zaharia] Added unit test for having many hash collisions
892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash code in ExternalAppendOnlyMap, instead use a buffered iterator to only read values with the current hash code.1 parent 9bad0b7 commit b45c13e
File tree
2 files changed
+44
-5
lines changed- core/src
- main/scala/org/apache/spark/util/collection
- test/scala/org/apache/spark/util/collection
2 files changed
+44
-5
lines changedLines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
231 | 232 | | |
232 | 233 | | |
233 | 234 | | |
234 | | - | |
| 235 | + | |
235 | 236 | | |
236 | 237 | | |
237 | 238 | | |
| |||
246 | 247 | | |
247 | 248 | | |
248 | 249 | | |
249 | | - | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
| 256 | + | |
256 | 257 | | |
257 | 258 | | |
258 | 259 | | |
| |||
325 | 326 | | |
326 | 327 | | |
327 | 328 | | |
328 | | - | |
| 329 | + | |
| 330 | + | |
329 | 331 | | |
330 | 332 | | |
331 | 333 | | |
| |||
Lines changed: 38 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
280 | 285 | | |
281 | 286 | | |
282 | 287 | | |
| |||
296 | 301 | | |
297 | 302 | | |
298 | 303 | | |
299 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
300 | 330 | | |
301 | 331 | | |
302 | 332 | | |
| |||
317 | 347 | | |
318 | 348 | | |
319 | 349 | | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
0 commit comments