-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25174][YARN]Limit the size of diagnostic message for am to unregister itself from rm #22180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #95073 has finished for PR 22180 at commit
|
| finalMsg = if (msg == null || msg.length <= finalMsgLimitSize) { | ||
| msg | ||
| } else { | ||
| msg.substring(0, finalMsgLimitSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the message in last finalMsgLimitSize is more useful.
| @volatile private var finished = false | ||
| @volatile private var finalStatus = getDefaultFinalStatus | ||
| @volatile private var finalMsg: String = "" | ||
| private val finalMsgLimitSize = sparkConf.get(AM_FINAL_MSG_LIMIT).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this to L165? just for code clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just read the value at the point where it's needed, given there's only one use.
| @volatile private var finished = false | ||
| @volatile private var finalStatus = getDefaultFinalStatus | ||
| @volatile private var finalMsg: String = "" | ||
| private val finalMsgLimitSize = sparkConf.get(AM_FINAL_MSG_LIMIT).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just read the value at the point where it's needed, given there's only one use.
| logInfo(s"Final app status: $finalStatus, exitCode: $exitCode" + | ||
| Option(msg).map(msg => s", (reason: $msg)").getOrElse("")) | ||
| finalMsg = msg | ||
| finalMsg = if (msg == null || msg.length <= finalMsgLimitSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringUtils.abbreviate seems simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's better, thanks
|
Test build #95132 has finished for PR 22180 at commit
|
|
Looks good, merging to master. |
| .doc("The limit size of final diagnostic message for our ApplicationMaster to unregister from" + | ||
| " the ResourceManager.") | ||
| .bytesConf(ByteUnit.BYTE) | ||
| .createWithDefaultString("1m") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for leaving a comment late, and nitpicking but shouldn't we better leave this unlimited by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in https://issues.apache.org/jira/browse/YARN-6125 yarn.app.attempt.diagnostics.limit.kc 's default value is 64K. I guess 1m here is enough
…register itself from rm ## What changes were proposed in this pull request? When using older versions of spark releases, a use case generated a huge code-gen file which hit the limitation `Constant pool has grown past JVM limit of 0xFFFF`. In this situation, it should fail immediately. But the diagnosis message sent to RM is too large, the ApplicationMaster suspended and RM's ZKStateStore was crashed. For 2.3 or later spark releases the limitation of code-gen has been removed, but maybe there are still some uncaught exceptions that contain oversized error message will cause such a problem. This PR is aim to cut down the diagnosis message size. ## How was this patch tested? Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#22180 from yaooqinn/SPARK-25174. Authored-by: Kent Yao <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
When using older versions of spark releases, a use case generated a huge code-gen file which hit the limitation
Constant pool has grown past JVM limit of 0xFFFF. In this situation, it should fail immediately. But the diagnosis message sent to RM is too large, the ApplicationMaster suspended and RM's ZKStateStore was crashed. For 2.3 or later spark releases the limitation of code-gen has been removed, but maybe there are still some uncaught exceptions that contain oversized error message will cause such a problem.This PR is aim to cut down the diagnosis message size.
How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.