Commit 3158fc3
[SPARK-23243][SPARK-20715][CORE][2.2] Fix RDD.repartition() data correctness issue
## What changes were proposed in this pull request?
Back port of #22354 and #17955 to 2.2 (#22354 depends on methods introduced by #17955).
-------
An alternative fix for #21698
When Spark rerun tasks for an RDD, there are 3 different behaviors:
1. determinate. Always return the same result with same order when rerun.
2. unordered. Returns same data set in random order when rerun.
3. indeterminate. Returns different result when rerun.
Normally Spark doesn't need to care about it. Spark runs stages one by one, when a task is failed, just rerun it. Although the rerun task may return a different result, users will not be surprised.
However, Spark may rerun a finished stage when seeing fetch failures. When this happens, Spark needs to rerun all the tasks of all the succeeding stages if the RDD output is indeterminate, because the input of the succeeding stages has been changed.
If the RDD output is determinate, we only need to rerun the failed tasks of the succeeding stages, because the input doesn't change.
If the RDD output is unordered, it's same as determinate, because shuffle partitioner is always deterministic(round-robin partitioner is not a shuffle partitioner that extends `org.apache.spark.Partitioner`), so the reducers will still get the same input data set.
This PR fixed the failure handling for `repartition`, to avoid correctness issues.
For `repartition`, it applies a stateful map function to generate a round-robin id, which is order sensitive and makes the RDD's output indeterminate. When the stage contains `repartition` reruns, we must also rerun all the tasks of all the succeeding stages.
**future improvement:**
1. Currently we can't rollback and rerun a shuffle map stage, and just fail. We should fix it later. https://issues.apache.org/jira/browse/SPARK-25341
2. Currently we can't rollback and rerun a result stage, and just fail. We should fix it later. https://issues.apache.org/jira/browse/SPARK-25342
3. We should provide public API to allow users to tag the random level of the RDD's computing function.
## How was this patch tested?
a new test case
Closes #22382 from bersprockets/SPARK-23243-2.2.
Lead-authored-by: Bruce Robbins <[email protected]>
Co-authored-by: Josh Rosen <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent af41ded commit 3158fc3
File tree
13 files changed
+750
-406
lines changed- core/src
- main/scala/org/apache/spark
- executor
- rdd
- scheduler
- test/scala/org/apache/spark
- scheduler
- sql/core/src/main/scala/org/apache/spark/sql/execution/exchange
13 files changed
+750
-406
lines changedLines changed: 360 additions & 276 deletions
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
Lines changed: 8 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | | - | |
329 | | - | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
330 | 336 | | |
331 | 337 | | |
332 | 338 | | |
| |||
Lines changed: 20 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
26 | 36 | | |
27 | 37 | | |
28 | 38 | | |
29 | 39 | | |
30 | | - | |
| 40 | + | |
| 41 | + | |
31 | 42 | | |
32 | 43 | | |
33 | 44 | | |
| |||
41 | 52 | | |
42 | 53 | | |
43 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
44 | 63 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
464 | | - | |
465 | | - | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
466 | 467 | | |
467 | 468 | | |
468 | 469 | | |
| |||
806 | 807 | | |
807 | 808 | | |
808 | 809 | | |
809 | | - | |
810 | | - | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
811 | 815 | | |
812 | 816 | | |
813 | 817 | | |
814 | | - | |
| 818 | + | |
| 819 | + | |
815 | 820 | | |
816 | 821 | | |
817 | 822 | | |
818 | | - | |
| 823 | + | |
| 824 | + | |
819 | 825 | | |
820 | 826 | | |
821 | 827 | | |
| |||
1634 | 1640 | | |
1635 | 1641 | | |
1636 | 1642 | | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
1637 | 1653 | | |
1638 | 1654 | | |
1639 | 1655 | | |
| |||
1838 | 1854 | | |
1839 | 1855 | | |
1840 | 1856 | | |
| 1857 | + | |
| 1858 | + | |
| 1859 | + | |
| 1860 | + | |
| 1861 | + | |
| 1862 | + | |
| 1863 | + | |
| 1864 | + | |
| 1865 | + | |
| 1866 | + | |
| 1867 | + | |
| 1868 | + | |
| 1869 | + | |
| 1870 | + | |
| 1871 | + | |
| 1872 | + | |
| 1873 | + | |
| 1874 | + | |
| 1875 | + | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
| 1886 | + | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
| 1911 | + | |
| 1912 | + | |
| 1913 | + | |
1841 | 1914 | | |
1842 | 1915 | | |
1843 | 1916 | | |
| |||
1891 | 1964 | | |
1892 | 1965 | | |
1893 | 1966 | | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
Lines changed: 72 additions & 38 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
331 | | - | |
| 331 | + | |
| 332 | + | |
332 | 333 | | |
333 | 334 | | |
334 | 335 | | |
335 | 336 | | |
336 | 337 | | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
| 338 | + | |
350 | 339 | | |
351 | 340 | | |
352 | 341 | | |
| |||
1240 | 1229 | | |
1241 | 1230 | | |
1242 | 1231 | | |
1243 | | - | |
| 1232 | + | |
| 1233 | + | |
1244 | 1234 | | |
1245 | 1235 | | |
1246 | 1236 | | |
| |||
1257 | 1247 | | |
1258 | 1248 | | |
1259 | 1249 | | |
1260 | | - | |
1261 | | - | |
1262 | | - | |
1263 | | - | |
1264 | | - | |
1265 | | - | |
1266 | | - | |
1267 | | - | |
1268 | | - | |
1269 | | - | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
1270 | 1258 | | |
1271 | 1259 | | |
1272 | 1260 | | |
| |||
1344 | 1332 | | |
1345 | 1333 | | |
1346 | 1334 | | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
1347 | 1392 | | |
1348 | 1393 | | |
1349 | 1394 | | |
| |||
1367 | 1412 | | |
1368 | 1413 | | |
1369 | 1414 | | |
1370 | | - | |
1371 | 1415 | | |
1372 | 1416 | | |
1373 | 1417 | | |
| |||
1416 | 1460 | | |
1417 | 1461 | | |
1418 | 1462 | | |
1419 | | - | |
1420 | | - | |
1421 | | - | |
1422 | | - | |
1423 | | - | |
1424 | | - | |
1425 | | - | |
1426 | | - | |
1427 | | - | |
1428 | | - | |
1429 | | - | |
| 1463 | + | |
1430 | 1464 | | |
1431 | 1465 | | |
1432 | 1466 | | |
| |||
0 commit comments