Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected.

This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success.

How was this patch tested?

This patch is tested with following situations:

Application is finished normally.
Application is finished by calling System.exit(n).
Application is killed by yarn command.
ApplicationMaster is killed by "SIGTERM" send by kill pid command.
ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53052 has finished for PR 11690 at commit 57d37d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume his is merge issue? remove this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad, I will remove this.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53149 has finished for PR 11690 at commit b4932c8.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53152 has finished for PR 11690 at commit b4932c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here modifier is not allowed.

[error] /Users/sshao/projects/apache-spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:126: illegal start of statement (no modifiers allowed here)
[error]     private class AMSignalHandler(name: String) extends SignalHandler {
[error]     ^
[error] one error found

@tgravescs
Copy link
Contributor

sorry for the delay on this, one minor comment otherwise looks good.

@jerryshao jerryshao force-pushed the SPARK-13642-1.6-backport branch 2 times, most recently from ac7f73d to b75927e Compare March 23, 2016 05:13
@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53897 has finished for PR 11690 at commit ac7f73d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53898 has finished for PR 11690 at commit b75927e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

asfgit pushed a commit that referenced this pull request Mar 23, 2016
…icationMaster

## What changes were proposed in this pull request?

This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected.

This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success.

## How was this patch tested?

This patch is tested with following situations:

Application is finished normally.
Application is finished by calling System.exit(n).
Application is killed by yarn command.
ApplicationMaster is killed by "SIGTERM" send by kill pid command.
ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure.

Author: jerryshao <[email protected]>

Closes #11690 from jerryshao/SPARK-13642-1.6-backport.
@jerryshao jerryshao closed this Mar 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants