[SPARK-24415][Core] Fixed the aggregated stage metrics by retaining stage objects in liveStages until all tasks are complete#22209
Conversation
|
PR title should describe the fix, not the problem. The problem is already described on the bug. |
|
ok to test |
|
Just a note the jira I filed for this was actually affecting running jobs as well (https://issues.apache.org/jira/browse/SPARK-24415) I think we have seen a similar issue on the history server though as well so perhaps its related. Haven't looked at the code here yet. |
|
Thanks for your comment @tgravescs. This fix is indeed for the SparkUI (running jobs) issue. It does not fix it for history server yet. |
|
Just checked, this fixes the Spark History Server issue (SPARK-24539) as well, I just needed to restart SHS in order for my changes to take affect. |
There was a problem hiding this comment.
removal from iterator should always happen
There was a problem hiding this comment.
For already completed stages, we will leave the removal of stage to happen in either onTaskEnd or onStageCompleted event. This ensures that stage metrics are updated even when onJobEnd event is received before onTaskEnd event.
There was a problem hiding this comment.
I am not sure I follow - if that is the case, why are we doing this for active stages here ? onStageCompleted/onTaskEnd would be fired for active stages as well.
There was a problem hiding this comment.
Btw, there is an existing bug that we are not updating pool, etc which we do in onStageCompleted ...
There was a problem hiding this comment.
I think the assumption here is that we will always receive onStageCompleted event before onJobEvent. If that does not occur for some reason, then any active stages are marked as skipped.
I don't know the scenario when onStageCompleted event is not received before onJobEnd event (or received at all). Let me look further into it. Additionally, I will also fix the bug for updating pool.
There was a problem hiding this comment.
This can happen when events get dropped ...
Spark makes best case effort to deliver events in order; but when events get dropped, UI becomes inconsistent. I assume this might be an effort to recover in that case ? Any thoughts @tgravescs, @vanzin ?
There was a problem hiding this comment.
In your question, this == this PR?
If so, no, that's not what it's fixing. Task end events can "naturally" arrive after the stage end event in the case of a stage failure, and this code was missing that case.
When event drops occur, a lot of things get out of sync, and this change wouldn't fix that. It perhaps could make it a little worse: if a task end event does not arrive, then maybe with this change the stage will never be actually removed from the live stages map. Not sure how easy it would be to recover from that though, since dropped events could probably cause other sorts of leaks in this class too, but I also feel that's a separate issue.
(Also, hopefully, dropped events for this listener should be less common in 2.3 after the listener bus changes.)
There was a problem hiding this comment.
To clarify, I was referring to 'this' being job end event received before stage end (for a stage which is part of a job).
I was not referring to task end event's (those can come in after stage or job end's).
Thanks for clarifying @vanzin ... given the snippet is not trying to recover from events drop, wondering why "non"-skipped stages would even be in the list : I would expect all of them to be skipped ?
There was a problem hiding this comment.
They would be in the list if the task end event arrives late, right? (Haven't really re-read the code to be sure.)
Unless it's guaranteed that the task end event will arrive before the job end event, unlike the case for the stage end one.
There was a problem hiding this comment.
This code now only handles the scenario when onStageCompleted event is dropped (not received). If we don't want to handle that scenario, then we can remove this part of the code altogether.
|
Test build #95183 has finished for PR 22209 at commit
|
cd1e856 to
55637c3
Compare
|
Sorry, but is my biggest pet peeve. Your PR title still only explains the problem. "Fix problem X" does not describe the fix. |
vanzin
left a comment
There was a problem hiding this comment.
Looks good, just some minor things.
There was a problem hiding this comment.
"update metrics for tasks that finish late". Same thing, much shorter. Could also add the bug number to the test name.
There was a problem hiding this comment.
This is the same comment as above, either remove this one or update each comment to say where the check it occurring. (I prefer the former.)
55637c3 to
69497a2
Compare
|
Test build #95218 has finished for PR 22209 at commit
|
69497a2 to
70678dc
Compare
|
Test build #95223 has finished for PR 22209 at commit
|
|
retest this please |
|
Test build #95229 has finished for PR 22209 at commit
|
|
Test build #95233 has finished for PR 22209 at commit
|
70678dc to
0552af0
Compare
There was a problem hiding this comment.
So I went back and took a closer look and I think this isn't entirely correct (and wasn't entirely correct before either).
If I remember the semantics correctly, the stage should be skipped if it is part of the job's stages, and is in the pending state when the job finishes.
If it's in the active state, it should not be marked as skipped. If you do that, the update to the skipped tasks (in L358) will most certainly be wrong.
So if the state is still active here, it means some event was missed. The best we can do in that case, I think, is remove it from the live stages list and update the pool data, and that's it.
On a related note, if the "onStageSubmitted" event is missed, the stage will remain in the "pending" state even if tasks start on it. Perhaps that could also be added to the "onTaskStart" handler, just to be sure the stage is marked as active.
There was a problem hiding this comment.
I changed the commit to mark a stage as skipped only if it is in Pending stage.
I did not update the "onTaskStart" to handle missed "onStageSubmitted" event as that can be a part of a different review, if required.
There was a problem hiding this comment.
I think that there are still two things here:
-
according to code in DAGScheduler#handleTaskCompletion, it seems like it's possible for a job to be marked finished before all task end events arrive. The job is marked finished as soon as all partitions are computed, so if you have e.g. speculative tasks you may miss things by removing the stage here.
-
re-reading the code again, the pool only really needs to be updated in the pending case, since other cases are handled in
onStageCompletedalready.
This will cause that leak that I mentioned before, though, where missing events makes things remains in the live lists forever. But that's already a problem with this code, and we should look at all of those as a separate task.
So I think it's more correct to only do anything for pending stages here.
There was a problem hiding this comment.
Done. Updating the stage in onJobEnd event only if it is in Pending status.
There was a problem hiding this comment.
Since you're touching this: .foreach { _ =>
|
Test build #95305 has finished for PR 22209 at commit
|
…tage objects in liveStages until all tasks are complete The problem occurs because stage object is removed from liveStages in AppStatusListener onStageCompletion. Because of this any onTaskEnd event received after onStageCompletion event do not update stage metrics. The fix is to retain stage objects in liveStages until all tasks are complete. Testing Done: 1. Fixed the reproducible example posted in the JIRA 2. Added unit test 3. Fixed the flaky test in UISeleniumSuite
0552af0 to
76f1801
Compare
|
Test build #95358 has finished for PR 22209 at commit
|
|
Looks like test failed due to https://issues.apache.org/jira/browse/SPARK-23622 |
|
retest this please |
|
Test build #95389 has finished for PR 22209 at commit
|
|
Test build #95382 has finished for PR 22209 at commit
|
|
retest this please |
vanzin
left a comment
There was a problem hiding this comment.
Found a small thing, plus a couple of nits.
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
Outdated
Show resolved
Hide resolved
|
Test build #95431 has finished for PR 22209 at commit
|
|
Test build #95444 has finished for PR 22209 at commit
|
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala
Show resolved
Hide resolved
streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala
Show resolved
Hide resolved
|
Merging to master. |
|
any reason not to merge to 2.3? its a bug in 2.3 |
|
I guess it should be ok. Just trying to be conservative and not inadvertently making the branch less stable in the middle of an rc cycle... |
|
(Feel free to do it if you think it should be there, btw.) |
|
would be nice to put into 2.3 as well, I realize we are close to rc but I don't think we should stop backporting fixes since I don't expect 2.4 to be stable for a while. If we stop for a bit for this rc we should have some way to track to pull back after rc. thoughts? |
|
pulled into 2.3 |
…tage objects in liveStages until all tasks are complete The problem occurs because stage object is removed from liveStages in AppStatusListener onStageCompletion. Because of this any onTaskEnd event received after onStageCompletion event do not update stage metrics. The fix is to retain stage objects in liveStages until all tasks are complete. 1. Fixed the reproducible example posted in the JIRA 2. Added unit test Closes #22209 from ankuriitg/ankurgupta/SPARK-24415. Authored-by: ankurgupta <ankur.gupta@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com> (cherry picked from commit 39a02d8) Signed-off-by: Thomas Graves <tgraves@apache.org>
…tage objects in liveStages until all tasks are complete The problem occurs because stage object is removed from liveStages in AppStatusListener onStageCompletion. Because of this any onTaskEnd event received after onStageCompletion event do not update stage metrics. The fix is to retain stage objects in liveStages until all tasks are complete. 1. Fixed the reproducible example posted in the JIRA 2. Added unit test Closes apache#22209 from ankuriitg/ankurgupta/SPARK-24415. Authored-by: ankurgupta <ankur.gupta@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com> (cherry picked from commit 39a02d8) Ref: LIHADOOP-41272 RB=1447258 BUG=LIHADOOP-41272 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=fli
What changes were proposed in this pull request?
The problem occurs because stage object is removed from liveStages in
AppStatusListener onStageCompletion. Because of this any onTaskEnd event
received after onStageCompletion event do not update stage metrics.
The fix is to retain stage objects in liveStages until all tasks are complete.
How was this patch tested?