-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30059][CORE]Stop AsyncEventQueue when interrupted in dispatch #26674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I applied patch #21356 in my cluster. Found that the Here is the log: Stopping the entire queue when interrupted in dispatch maybe not the best choice. If it's an important queue (e.g. dynamic resource allocation), I think it's better to stop the SparkContext. Do you have any advice? cc @squito @cloud-fan :) |
affa555 to
8df4c88
Compare
|
Can you please open another jira for this, since SPARK-24309 is already in shipped releases? I haven't thought about this a lot, but I don't know if I really like this idea. You would be able to stop the running job if there were only one job, but what about with concurrent jobs? I wonder if we should just have some special case handling in the EventLoggingListener to retry once after interrupt? |
|
Thanks for your reply. @squito
I didn't get the point. What would happen if just stopping event log queue when concurrent jobs running. Could you explain this in detail?
I agree with this. |
sorry, please ignore that -- I misread your earlier comments, I had thought you were discussing stopping running jobs.
yes good point. I'd need to walk through this very carefully but that sounds reasonable to me. |
|
do you know what version of hadoop you are on? I am trying to compare with the code -- its clearly not trunk (the Still, even looking at trunk, I have a guess at what is happening. The first part of your log shows the interrupt is coming from the DataStreamer, though that is running in a separate thread and isnt' directly interrupting the event log queue thread. But my guess is that calls to so its possible that we're actually leaving the previous call to that would probably be an hdfs bug, but seems to at least fit the pattern of what we see here, and something we could at least check for. @steveloughran do you have any idea how this interrupt from DataStreamer is getting back to the spark event log writer? |
my hadoop version is 2.7.1 |
errors terminating the datastreamer thread are caught and then escalated to whichever the next API call uses the instance. Usually they are IO problems considered non-recoverable I don't know HDFS internals, but a look at the code hints this happened due to failures to talk to any datanode. Or at least, that's what the code is assuming -that any interrupt is a timeout in connections, If you can show there's a problem happening on the 3.2.x libraries you should be able to persuade some (else!) to have a look @ this. |
| val listenerThread = Thread.currentThread() | ||
| new Thread(new Runnable { | ||
| override def run(): Unit = { | ||
| while (sleep) { | ||
| Thread.sleep(10) | ||
| } | ||
| listenerThread.interrupt() | ||
| } | ||
| }).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it means that EventLogListener does similar thing like this inside?
|
Can one of the admins verify this patch? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
PR #21356 stop
AsyncEventQueuewhen interrupted inpostToAll.However, if it's interrupted in
AsyncEventQueue#dispatch, SparkContext would be stopped.This PR proposes to stop
AsyncEventQueuewhen interrupted in dispatch, rather than stop the SparkContext.Why are the changes needed?
Avoid stopping the SparkContext when interrupted in
AsyncEventQueue#dispatch.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT.