-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13642][Yarn][1.6-backport] Properly handle signal kill in ApplicationMaster #11690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #53052 has finished for PR 11690 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume his is merge issue? remove this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad, I will remove this.
|
Test build #53149 has finished for PR 11690 at commit
|
|
Jenkins, retest this please. |
|
Test build #53152 has finished for PR 11690 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here modifier is not allowed.
[error] /Users/sshao/projects/apache-spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:126: illegal start of statement (no modifiers allowed here)
[error] private class AMSignalHandler(name: String) extends SignalHandler {
[error] ^
[error] one error found
|
sorry for the delay on this, one minor comment otherwise looks good. |
ac7f73d to
b75927e
Compare
|
Test build #53897 has finished for PR 11690 at commit
|
|
Test build #53898 has finished for PR 11690 at commit
|
|
+1 |
…icationMaster ## What changes were proposed in this pull request? This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected. This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success. ## How was this patch tested? This patch is tested with following situations: Application is finished normally. Application is finished by calling System.exit(n). Application is killed by yarn command. ApplicationMaster is killed by "SIGTERM" send by kill pid command. ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure. Author: jerryshao <[email protected]> Closes #11690 from jerryshao/SPARK-13642-1.6-backport.
What changes were proposed in this pull request?
This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected.
This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success.
How was this patch tested?
This patch is tested with following situations:
Application is finished normally.
Application is finished by calling System.exit(n).
Application is killed by yarn command.
ApplicationMaster is killed by "SIGTERM" send by kill pid command.
ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure.