Commit a1f8779
SPARK-1104: kill Process in workerThread of ExecutorRunner
As reported in https://spark-project.atlassian.net/browse/SPARK-1104
By @pwendell: "Sometimes due to large shuffles executors will take a long time shutting down. In particular this can happen if large numbers of shuffle files are around (this will be alleviated by SPARK-1103, but nonetheless...).
The symptom is you have DEAD workers sitting around in the UI and the existing workers keep trying to re-register but can't because they've been assumed dead."
In this patch, I add lines in the handler of InterruptedException in workerThread of executorRunner, so that the process.destroy() and process.waitFor() can only block the workerThread instead of blocking the worker Actor...
---------
analysis: process.destroy() is a blocking method, i.e. it only returns when all shutdownHook threads return...so calling it in Worker thread will make Worker block for a long while....
about what will happen on the shutdown hooks when the JVM process is killed: http://www.tutorialspoint.com/java/lang/runtime_addshutdownhook.htm
Author: CodingCat <[email protected]>
Closes #35 from CodingCat/SPARK-1104 and squashes the following commits:
85767da [CodingCat] add null checking and remove unnecessary killProce
3107aeb [CodingCat] address Aaron's comments
eb615ba [CodingCat] kill the process when the error happens
0accf2f [CodingCat] set process to null after killed it
1d511c8 [CodingCat] kill Process in workerThread
(cherry picked from commit f99af85)
Signed-off-by: Aaron Davidson <[email protected]>1 parent 2250c7a commit a1f8779
File tree
1 file changed
+14
-17
lines changed- core/src/main/scala/org/apache/spark/deploy/worker
1 file changed
+14
-17
lines changedLines changed: 14 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
62 | 61 | | |
63 | 62 | | |
64 | 63 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
| 64 | + | |
70 | 65 | | |
71 | 66 | | |
72 | 67 | | |
73 | 68 | | |
74 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
75 | 78 | | |
76 | 79 | | |
77 | 80 | | |
| 81 | + | |
78 | 82 | | |
79 | 83 | | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
128 | 127 | | |
129 | 128 | | |
130 | 129 | | |
131 | | - | |
132 | 130 | | |
133 | 131 | | |
134 | 132 | | |
| |||
148 | 146 | | |
149 | 147 | | |
150 | 148 | | |
151 | | - | |
| 149 | + | |
152 | 150 | | |
153 | | - | |
| 151 | + | |
| 152 | + | |
154 | 153 | | |
155 | 154 | | |
156 | | - | |
157 | | - | |
158 | | - | |
| 155 | + | |
159 | 156 | | |
160 | 157 | | |
161 | 158 | | |
| |||
0 commit comments