Skip to content

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

When using older versions of spark releases, a use case generated a huge code-gen file which hit the limitation Constant pool has grown past JVM limit of 0xFFFF. In this situation, it should fail immediately. But the diagnosis message sent to RM is too large, the ApplicationMaster suspended and RM's ZKStateStore was crashed. For 2.3 or later spark releases the limitation of code-gen has been removed, but maybe there are still some uncaught exceptions that contain oversized error message will cause such a problem.

This PR is aim to cut down the diagnosis message size.

How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

@SparkQA
Copy link

SparkQA commented Aug 22, 2018

Test build #95073 has finished for PR 22180 at commit 8f5b67a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

cc @gatorsmile @vanzin

finalMsg = if (msg == null || msg.length <= finalMsgLimitSize) {
msg
} else {
msg.substring(0, finalMsgLimitSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the message in last finalMsgLimitSize is more useful.

@volatile private var finished = false
@volatile private var finalStatus = getDefaultFinalStatus
@volatile private var finalMsg: String = ""
private val finalMsgLimitSize = sparkConf.get(AM_FINAL_MSG_LIMIT).toInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this to L165? just for code clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just read the value at the point where it's needed, given there's only one use.

@volatile private var finished = false
@volatile private var finalStatus = getDefaultFinalStatus
@volatile private var finalMsg: String = ""
private val finalMsgLimitSize = sparkConf.get(AM_FINAL_MSG_LIMIT).toInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just read the value at the point where it's needed, given there's only one use.

logInfo(s"Final app status: $finalStatus, exitCode: $exitCode" +
Option(msg).map(msg => s", (reason: $msg)").getOrElse(""))
finalMsg = msg
finalMsg = if (msg == null || msg.length <= finalMsgLimitSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringUtils.abbreviate seems simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's better, thanks

@SparkQA
Copy link

SparkQA commented Aug 23, 2018

Test build #95132 has finished for PR 22180 at commit 3271c3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 24, 2018

Looks good, merging to master.

@asfgit asfgit closed this in f8346d2 Aug 24, 2018
.doc("The limit size of final diagnostic message for our ApplicationMaster to unregister from" +
" the ResourceManager.")
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("1m")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for leaving a comment late, and nitpicking but shouldn't we better leave this unlimited by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in https://issues.apache.org/jira/browse/YARN-6125 yarn.app.attempt.diagnostics.limit.kc 's default value is 64K. I guess 1m here is enough

bogdanrdc pushed a commit to bogdanrdc/spark that referenced this pull request Aug 28, 2018
…register itself from rm

## What changes were proposed in this pull request?

When using older versions of spark releases,  a use case generated a huge code-gen file which hit the limitation `Constant pool has grown past JVM limit of 0xFFFF`.  In this situation, it should fail immediately. But the diagnosis message sent to RM is too large,  the ApplicationMaster suspended and RM's ZKStateStore was crashed. For 2.3 or later spark releases the limitation of code-gen has been removed, but maybe there are still some uncaught exceptions that contain oversized error message will cause such a problem.

This PR is aim to cut down the diagnosis message size.

## How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#22180 from yaooqinn/SPARK-25174.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants