-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-24415][Core] Fixed the aggregated stage metrics by retaining stage objects in liveStages until all tasks are complete #22209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -350,11 +350,21 @@ private[spark] class AppStatusListener( | |
| val e = it.next() | ||
| if (job.stageIds.contains(e.getKey()._1)) { | ||
| val stage = e.getValue() | ||
| stage.status = v1.StageStatus.SKIPPED | ||
| job.skippedStages += stage.info.stageId | ||
| job.skippedTasks += stage.info.numTasks | ||
| it.remove() | ||
| update(stage, now) | ||
| // Mark the stage as skipped if in Pending status | ||
| if (v1.StageStatus.PENDING.equals(stage.status)) { | ||
|
juliuszsompolski marked this conversation as resolved.
|
||
| stage.status = v1.StageStatus.SKIPPED | ||
| job.skippedStages += stage.info.stageId | ||
| job.skippedTasks += stage.info.numTasks | ||
| job.activeStages -= 1 | ||
|
|
||
| pools.get(stage.schedulingPool).foreach { pool => | ||
| pool.stageIds = pool.stageIds - stage.info.stageId | ||
| update(pool, now) | ||
| } | ||
|
|
||
| it.remove() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removal from iterator should always happen
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For already completed stages, we will leave the removal of stage to happen in either onTaskEnd or onStageCompleted event. This ensures that stage metrics are updated even when onJobEnd event is received before onTaskEnd event.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I follow - if that is the case, why are we doing this for active stages here ? onStageCompleted/onTaskEnd would be fired for active stages as well.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw, there is an existing bug that we are not updating pool, etc which we do in onStageCompleted ...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the assumption here is that we will always receive onStageCompleted event before onJobEvent. If that does not occur for some reason, then any active stages are marked as skipped.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can happen when events get dropped ...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In your question, this == this PR? If so, no, that's not what it's fixing. Task end events can "naturally" arrive after the stage end event in the case of a stage failure, and this code was missing that case. When event drops occur, a lot of things get out of sync, and this change wouldn't fix that. It perhaps could make it a little worse: if a task end event does not arrive, then maybe with this change the stage will never be actually removed from the live stages map. Not sure how easy it would be to recover from that though, since dropped events could probably cause other sorts of leaks in this class too, but I also feel that's a separate issue. (Also, hopefully, dropped events for this listener should be less common in 2.3 after the listener bus changes.)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify, I was referring to 'this' being job end event received before stage end (for a stage which is part of a job). I was not referring to task end event's (those can come in after stage or job end's). Thanks for clarifying @vanzin ... given the snippet is not trying to recover from events drop, wondering why "non"-skipped stages would even be in the list : I would expect all of them to be skipped ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They would be in the list if the task end event arrives late, right? (Haven't really re-read the code to be sure.) Unless it's guaranteed that the task end event will arrive before the job end event, unlike the case for the stage end one.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code now only handles the scenario when onStageCompleted event is dropped (not received). If we don't want to handle that scenario, then we can remove this part of the code altogether. |
||
| update(stage, now, last = true) | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -506,7 +516,16 @@ private[spark] class AppStatusListener( | |
| if (killedDelta > 0) { | ||
| stage.killedSummary = killedTasksSummary(event.reason, stage.killedSummary) | ||
| } | ||
| maybeUpdate(stage, now) | ||
| // Remove stage if there are no active tasks left and stage is already finished | ||
|
ankuriitg marked this conversation as resolved.
Outdated
|
||
| val removeStage = | ||
| stage.activeTasks == 0 && | ||
| (v1.StageStatus.COMPLETE.equals(stage.status) || | ||
| v1.StageStatus.FAILED.equals(stage.status)) | ||
| if (removeStage) { | ||
| update(stage, now, last = true) | ||
|
ankuriitg marked this conversation as resolved.
|
||
| } else { | ||
| maybeUpdate(stage, now) | ||
| } | ||
|
|
||
| // Store both stage ID and task index in a single long variable for tracking at job level. | ||
| val taskIndex = (event.stageId.toLong << Integer.SIZE) | event.taskInfo.index | ||
|
|
@@ -521,7 +540,7 @@ private[spark] class AppStatusListener( | |
| if (killedDelta > 0) { | ||
| job.killedSummary = killedTasksSummary(event.reason, job.killedSummary) | ||
| } | ||
| maybeUpdate(job, now) | ||
| conditionalLiveUpdate(job, now, removeStage) | ||
| } | ||
|
|
||
| val esummary = stage.executorSummary(event.taskInfo.executorId) | ||
|
|
@@ -532,14 +551,17 @@ private[spark] class AppStatusListener( | |
| if (metricsDelta != null) { | ||
| esummary.metrics = LiveEntityHelpers.addMetrics(esummary.metrics, metricsDelta) | ||
| } | ||
| maybeUpdate(esummary, now) | ||
| conditionalLiveUpdate(esummary, now, removeStage) | ||
|
|
||
| if (!stage.cleaning && stage.savedTasks.get() > maxTasksPerStage) { | ||
| stage.cleaning = true | ||
| kvstore.doAsync { | ||
| cleanupTasks(stage) | ||
| } | ||
| } | ||
| if (removeStage) { | ||
| liveStages.remove((event.stageId, event.stageAttemptId)) | ||
| } | ||
| } | ||
|
|
||
| liveExecutors.get(event.taskInfo.executorId).foreach { exec => | ||
|
|
@@ -564,17 +586,13 @@ private[spark] class AppStatusListener( | |
|
|
||
| // Force an update on live applications when the number of active tasks reaches 0. This is | ||
| // checked in some tests (e.g. SQLTestUtilsBase) so it needs to be reliably up to date. | ||
| if (exec.activeTasks == 0) { | ||
| liveUpdate(exec, now) | ||
| } else { | ||
| maybeUpdate(exec, now) | ||
| } | ||
| conditionalLiveUpdate(exec, now, exec.activeTasks == 0) | ||
| } | ||
| } | ||
|
|
||
| override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { | ||
| val maybeStage = | ||
| Option(liveStages.remove((event.stageInfo.stageId, event.stageInfo.attemptNumber))) | ||
| Option(liveStages.get((event.stageInfo.stageId, event.stageInfo.attemptNumber))) | ||
| maybeStage.foreach { stage => | ||
| val now = System.nanoTime() | ||
| stage.info = event.stageInfo | ||
|
|
@@ -608,14 +626,20 @@ private[spark] class AppStatusListener( | |
| } | ||
|
|
||
| stage.executorSummaries.values.foreach(update(_, now)) | ||
| update(stage, now, last = true) | ||
|
|
||
| val executorIdsForStage = stage.blackListedExecutors | ||
| executorIdsForStage.foreach { executorId => | ||
| liveExecutors.get(executorId).foreach { exec => | ||
| removeBlackListedStageFrom(exec, event.stageInfo.stageId, now) | ||
| } | ||
| } | ||
|
|
||
| // Remove stage only if there are no active tasks remaining | ||
| val removeStage = stage.activeTasks == 0 | ||
| update(stage, now, last = removeStage) | ||
| if (removeStage) { | ||
| liveStages.remove((event.stageInfo.stageId, event.stageInfo.attemptNumber)) | ||
| } | ||
| } | ||
|
|
||
| appSummary = new AppSummary(appSummary.numCompletedJobs, appSummary.numCompletedStages + 1) | ||
|
|
@@ -882,6 +906,14 @@ private[spark] class AppStatusListener( | |
| } | ||
| } | ||
|
|
||
| private def conditionalLiveUpdate(entity: LiveEntity, now: Long, condition: Boolean): Unit = { | ||
| if (condition) { | ||
| liveUpdate(entity, now) | ||
| } else { | ||
| maybeUpdate(entity, now) | ||
| } | ||
| } | ||
|
|
||
| private def cleanupExecutors(count: Long): Unit = { | ||
| // Because the limit is on the number of *dead* executors, we need to calculate whether | ||
| // there are actually enough dead executors to be deleted. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.