Commit 3ae7ab8
[SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs
## What changes were proposed in this pull request?
(edited)
Fixes a bug introduced in #16121
In PairDeserializer convert each batch of keys and values to lists (if they do not have `__len__` already) so that we can check that they are the same size. Normally they already are lists so this should not have a performance impact, but this is needed when repeated `zip`'s are done.
## How was this patch tested?
Additional unit test
Author: Andrew Ray <[email protected]>
Closes #19226 from aray/SPARK-21985.
(cherry picked from commit 6adf67d)
Signed-off-by: hyukjinkwon <[email protected]>1 parent e49c997 commit 3ae7ab8
2 files changed
+17
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| |||
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
329 | 333 | | |
330 | 334 | | |
331 | 335 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
579 | 579 | | |
580 | 580 | | |
581 | 581 | | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
582 | 594 | | |
583 | 595 | | |
584 | 596 | | |
| |||
0 commit comments