Commit ee22be0
Merge pull request alteryx#189 from tgravescs/sparkYarnErrorHandling
Impove Spark on Yarn Error handling
Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it.
I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn.
(cherry picked from commit aa638ed)
Signed-off-by: Patrick Wendell <[email protected]>1 parent d77c337 commit ee22be0
File tree
6 files changed
+61
-30
lines changed- core/src/main/scala/org/apache/spark/scheduler/cluster
- docs
- yarn/src/main/scala/org/apache/spark/deploy/yarn
6 files changed
+61
-30
lines changedLines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| 202 | + | |
202 | 203 | | |
203 | 204 | | |
204 | 205 | | |
| |||
Lines changed: 0 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
66 | 65 | | |
67 | 66 | | |
68 | 67 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
| |||
Lines changed: 26 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
| |||
227 | 229 | | |
228 | 230 | | |
229 | 231 | | |
230 | | - | |
231 | | - | |
| 232 | + | |
| 233 | + | |
232 | 234 | | |
233 | 235 | | |
234 | | - | |
235 | | - | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
236 | 239 | | |
237 | 240 | | |
238 | 241 | | |
| |||
251 | 254 | | |
252 | 255 | | |
253 | 256 | | |
254 | | - | |
255 | | - | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
256 | 262 | | |
257 | 263 | | |
258 | 264 | | |
| |||
268 | 274 | | |
269 | 275 | | |
270 | 276 | | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
275 | 284 | | |
276 | 285 | | |
277 | 286 | | |
278 | 287 | | |
279 | | - | |
280 | 288 | | |
281 | 289 | | |
282 | 290 | | |
283 | 291 | | |
284 | 292 | | |
285 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
286 | 298 | | |
287 | 299 | | |
288 | 300 | | |
| |||
321 | 333 | | |
322 | 334 | | |
323 | 335 | | |
324 | | - | |
| 336 | + | |
325 | 337 | | |
326 | 338 | | |
327 | 339 | | |
| |||
335 | 347 | | |
336 | 348 | | |
337 | 349 | | |
| 350 | + | |
338 | 351 | | |
339 | 352 | | |
340 | 353 | | |
| |||
Lines changed: 20 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| |||
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
87 | 106 | | |
88 | 107 | | |
89 | 108 | | |
| |||
97 | 116 | | |
98 | 117 | | |
99 | 118 | | |
100 | | - | |
101 | 119 | | |
102 | 120 | | |
103 | 121 | | |
| |||
215 | 233 | | |
216 | 234 | | |
217 | 235 | | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | 236 | | |
224 | 237 | | |
225 | 238 | | |
| |||
334 | 347 | | |
335 | 348 | | |
336 | 349 | | |
337 | | - | |
338 | 350 | | |
339 | 351 | | |
340 | 352 | | |
| |||
360 | 372 | | |
361 | 373 | | |
362 | 374 | | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | 375 | | |
369 | 376 | | |
370 | 377 | | |
| |||
442 | 449 | | |
443 | 450 | | |
444 | 451 | | |
| 452 | + | |
445 | 453 | | |
446 | 454 | | |
447 | 455 | | |
| |||
Lines changed: 12 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
75 | 76 | | |
76 | 77 | | |
77 | 78 | | |
| 79 | + | |
78 | 80 | | |
79 | 81 | | |
80 | 82 | | |
| |||
253 | 255 | | |
254 | 256 | | |
255 | 257 | | |
256 | | - | |
257 | | - | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
258 | 268 | | |
259 | 269 | | |
260 | 270 | | |
| |||
378 | 388 | | |
379 | 389 | | |
380 | 390 | | |
381 | | - | |
382 | | - | |
383 | 391 | | |
384 | 392 | | |
385 | 393 | | |
| |||
0 commit comments