Skip to content

Conversation

@pgandhi999
Copy link

@pgandhi999 pgandhi999 commented Jul 21, 2017

The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.

How was this patch tested?

Manually Tested the fix by observing the UI change before and after.
Before:
screen shot-before
After:
screen shot-after

Please review http://spark.apache.org/contributing.html before opening a pull request.

Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed.
@tgravescs
Copy link
Contributor

ok to test

@pgandhi999 pgandhi999 changed the title [SPARK-21503][UI]: Fixed the issue [SPARK-21503][UI]: Spark UI shows incorrect task status for a killed Executor Process Jul 21, 2017
return
case _: ExceptionFailure =>
taskSummary.tasksFailed += 1
case _: ExecutorLostFailure =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can use info.successful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I have replaced most of the task failed cases with info.successful. Tested it as well. Have also written unit test to simulate my issue. Thank you.

@SparkQA
Copy link

SparkQA commented Jul 22, 2017

Test build #79850 has finished for PR 18707 at commit 172fc20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2017

Test build #79937 has finished for PR 18707 at commit 81422e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pgandhi999 and others added 2 commits July 26, 2017 16:26
Apache Spark Pull Request - July 26, 2017
…es not create SparkContext

Added a flag to check whether user has initialized Spark Context. If it is true, then we let Application Master unregister with Resource Manager else we do not.
@SparkQA
Copy link

SparkQA commented Jul 26, 2017

Test build #79977 has finished for PR 18707 at commit f454c89.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

… that does not create SparkContext"

This reverts commit f454c89.

"Merged another issue to this one by mistake"
@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #79976 has finished for PR 18707 at commit 55c6c37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new AnalysisException(s\"UDF class $className doesn't implement any UDF interface\")
  • throw new AnalysisException(s\"It is invalid to implement multiple UDF interfaces, UDF class $className\")
  • throw new AnalysisException(s\"UDF class with $n type arguments is not supported.\")
  • throw new AnalysisException(s\"Can not instantiate class $className, please make sure it has public non argument constructor\")
  • case e: ClassNotFoundException => throw new AnalysisException(s\"Can not load class $className, please make sure it is on the classpath\")

@SparkQA
Copy link

SparkQA commented Jul 27, 2017

Test build #79978 has finished for PR 18707 at commit 6b7d5c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// first attempt of this task. This may not be 100% accurate because the first attempt
// could have failed half-way through. The correct fix would be to keep track of the
// metrics added by each attempt, but this is much more complicated.
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if (taskEnd.reason == Resubmitted) {
  return ;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the required changes and verified them. Thank you.

if (info.successful) {
taskSummary.tasksComplete += 1
} else {
taskSummary.tasksFailed += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this keeps the previous behavior, cc @zsxwing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the list of possible task failure reasons, and I found info.successful to be updated for all except Resubmitted case. If you think there is a possibility of any other reason that has been overlooked by me, do let me know.

pgandhi999 and others added 2 commits July 28, 2017 10:24
[SPARK-21503]- Making Changes as per comments: Removed match case statement and replaced it with an if clause.
@SparkQA
Copy link

SparkQA commented Jul 28, 2017

Test build #80026 has finished for PR 18707 at commit bc41664.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class OneVsRestParams(HasFeaturesCol, HasLabelCol, HasWeightCol, HasPredictionCol):
  • class ArrowWriter(

@SparkQA
Copy link

SparkQA commented Jul 28, 2017

Test build #80027 has finished for PR 18707 at commit e46126f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jobListener.onTaskEnd(
SparkListenerTaskEnd(0, 0, "result", Success, taskInfo, taskMetrics))
}
(3 to 4).foreach {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test case which uses this method to test that the task is correctly displayed as failed. In that case, is this test suite modification needed ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I did try to simulate a similar unit test for Executors Tab UI, but it needs creating a SparkUI object which cannot be accessed from StagePageSuite. Hence, I am reverting my unit test for the time being.

pgandhi999 and others added 2 commits August 1, 2017 08:58
Spark - August 1, 2017
[SPARK-21503]: Reverting Unit Test Code - Not needed.
@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80119 has finished for PR 18707 at commit 9b3cebc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedSubqueryColumnAliases(
  • sealed trait FrameType
  • sealed trait SpecialFrameBoundary extends Expression with Unevaluable
  • sealed trait WindowFrame extends Expression with Unevaluable

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80122 has finished for PR 18707 at commit 7f03341.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

LGTM, cc @zsxwing @cloud-fan

if (taskEnd.reason == Resubmitted) {
return
}
if (info.successful) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the relationship between info.successful and taskEnd.reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def successful: Boolean = finished && !failed && !killed

So if the task state shows finished and it has neither failed nor been killed, then info.successful is true. Now, there could be multiple reasons for a failed or killed task, and taskEnd.reason lists those. However, if task state is SUCCESS, so is the reason and thus, info.successful will be true in case of SUCCESS and false for all the other cases.

// could have failed half-way through. The correct fix would be to keep track of the
// metrics added by each attempt, but this is much more complicated.
return
case _: ExceptionFailure =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix the problem by changing this line to case _: TaskFailedReason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I added info.successful check at the suggestion of @zsxwing and @jiangxb1987 . I feel this covers all the failed and killed cases which TaskFailedReason might have overlooked. However, if you feel otherwise, please let me know.

@tgravescs
Copy link
Contributor

change LGTM

@cloud-fan I don't see any issues with the current approach, it was suggested earlier, but I think your suggestion would work also. Functionally I'm not seeing any difference. Did I miss something or are you ok with this way?

asfgit pushed a commit that referenced this pull request Aug 9, 2017
…xecutor Process

The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.

## How was this patch tested?
Manually Tested the fix by observing the UI change before and after.
Before:
<img width="1398" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png">
After:
<img width="1385" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png">

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: pgandhi <[email protected]>
Author: pgandhi999 <[email protected]>

Closes #18707 from pgandhi999/master.

(cherry picked from commit f016f5c)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

I was just asking about different fixes, I'm ok with the current fix.

thanks, merging to master/2.2!

@asfgit asfgit closed this in f016f5c Aug 9, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…xecutor Process

The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.

## How was this patch tested?
Manually Tested the fix by observing the UI change before and after.
Before:
<img width="1398" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png">
After:
<img width="1385" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png">

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: pgandhi <[email protected]>
Author: pgandhi999 <[email protected]>

Closes apache#18707 from pgandhi999/master.

(cherry picked from commit f016f5c)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants