Commit e17c7ea
committed
[SPARK-25299] Shuffle locations api (apache#517)
Implements the shuffle locations API as part of SPARK-25299.
This adds an additional field to all `MapStatus` objects: a `MapShuffleLocations` that indicates where a task's map output is stored. This module is optional and implementations of the pluggable shuffle writers and readers can ignore it accordingly.
This API is designed with the use case in mind of future plugin implementations desiring to have the driver store metadata about where shuffle blocks are stored.
There are a few caveats to this design:
- We originally wanted to remove the `BlockManagerId` from `MapStatus` entirely and replace it with this object. However, doing this proves to be very difficult, as many places use the block manager ID for other kinds of shuffle data bookkeeping. As a result, we concede to storing the block manager ID redundantly here. However, the overhead should be minimal: because we cache block manager ids and default map shuffle locations, the two fields in `MapStatus` should point to the same object on the heap. Thus we add `O(M)` storage overhead on the driver, where for each map status we're storing an additional pointer to the same on-heap object. We will run benchmarks against the TPC-DS workload to see if there are significant performance repercussions for this implementation.
- `KryoSerializer` expects `CompressedMapStatus` and `HighlyCompressedMapStatus` to be serialized via reflection, so originally all fields of these classes needed to be registered with Kryo. However, the `MapShuffleLocations` is now pluggable. We think however that previously Kryo was defaulting to Java serialization anyways, so we now just explicitly tell Kryo to use `ExternalizableSerializer` to deal with these objects. There's a small hack in the serialization protocol that attempts to avoid serializing the same `BlockManagerId` twice in the case that the map shuffle locations is a `DefaultMapShuffleLocations`.1 parent 8f5fb60 commit e17c7ea
File tree
25 files changed
+450
-120
lines changed- core/src
- main
- java/org/apache/spark
- api/shuffle
- shuffle/sort
- io
- scala/org/apache/spark
- scheduler
- serializer
- shuffle
- sort
- storage
- test
- java/org/apache/spark/shuffle/sort
- scala/org/apache/spark
- scheduler
- serializer
- shuffle
- sort
- io
25 files changed
+450
-120
lines changedLines changed: 39 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
Lines changed: 25 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
Lines changed: 16 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
134 | 136 | | |
135 | 137 | | |
136 | 138 | | |
137 | | - | |
138 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
139 | 144 | | |
140 | 145 | | |
141 | 146 | | |
| |||
168 | 173 | | |
169 | 174 | | |
170 | 175 | | |
171 | | - | |
172 | | - | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
173 | 181 | | |
174 | 182 | | |
175 | 183 | | |
| |||
178 | 186 | | |
179 | 187 | | |
180 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
181 | 193 | | |
182 | 194 | | |
183 | 195 | | |
| |||
Lines changed: 76 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
Lines changed: 8 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
221 | 223 | | |
222 | 224 | | |
223 | 225 | | |
| 226 | + | |
224 | 227 | | |
225 | 228 | | |
226 | 229 | | |
| |||
231 | 234 | | |
232 | 235 | | |
233 | 236 | | |
234 | | - | |
| 237 | + | |
235 | 238 | | |
236 | 239 | | |
237 | 240 | | |
| |||
240 | 243 | | |
241 | 244 | | |
242 | 245 | | |
243 | | - | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
244 | 250 | | |
245 | 251 | | |
246 | 252 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
| |||
49 | 53 | | |
50 | 54 | | |
51 | 55 | | |
| 56 | + | |
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
| |||
61 | 66 | | |
62 | 67 | | |
63 | 68 | | |
| 69 | + | |
64 | 70 | | |
65 | 71 | | |
66 | 72 | | |
67 | 73 | | |
68 | 74 | | |
| 75 | + | |
69 | 76 | | |
70 | 77 | | |
71 | 78 | | |
| |||
90 | 97 | | |
91 | 98 | | |
92 | 99 | | |
93 | | - | |
| 100 | + | |
94 | 101 | | |
95 | 102 | | |
96 | 103 | | |
| 104 | + | |
97 | 105 | | |
98 | 106 | | |
99 | 107 | | |
| |||
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
33 | | - | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | 38 | | |
| 39 | + | |
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
| |||
41 | 45 | | |
42 | 46 | | |
43 | 47 | | |
44 | | - | |
| 48 | + | |
45 | 49 | | |
46 | 50 | | |
47 | 51 | | |
0 commit comments