Skip to content

Commit f8346d2

Browse files
yaooqinnMarcelo Vanzin
authored andcommitted
[SPARK-25174][YARN] Limit the size of diagnostic message for am to unregister itself from rm
## What changes were proposed in this pull request? When using older versions of spark releases, a use case generated a huge code-gen file which hit the limitation `Constant pool has grown past JVM limit of 0xFFFF`. In this situation, it should fail immediately. But the diagnosis message sent to RM is too large, the ApplicationMaster suspended and RM's ZKStateStore was crashed. For 2.3 or later spark releases the limitation of code-gen has been removed, but maybe there are still some uncaught exceptions that contain oversized error message will cause such a problem. This PR is aim to cut down the diagnosis message size. ## How was this patch tested? Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22180 from yaooqinn/SPARK-25174. Authored-by: Kent Yao <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
1 parent 8bb9414 commit f8346d2

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn
1919

2020
import java.io.{File, IOException}
2121
import java.lang.reflect.{InvocationTargetException, Modifier}
22-
import java.net.{Socket, URI, URL}
22+
import java.net.{URI, URL}
2323
import java.security.PrivilegedExceptionAction
2424
import java.util.concurrent.{TimeoutException, TimeUnit}
2525

@@ -28,6 +28,7 @@ import scala.concurrent.Promise
2828
import scala.concurrent.duration.Duration
2929
import scala.util.control.NonFatal
3030

31+
import org.apache.commons.lang3.{StringUtils => ComStrUtils}
3132
import org.apache.hadoop.fs.{FileSystem, Path}
3233
import org.apache.hadoop.util.StringUtils
3334
import org.apache.hadoop.yarn.api._
@@ -368,7 +369,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
368369
}
369370
logInfo(s"Final app status: $finalStatus, exitCode: $exitCode" +
370371
Option(msg).map(msg => s", (reason: $msg)").getOrElse(""))
371-
finalMsg = msg
372+
finalMsg = ComStrUtils.abbreviate(msg, sparkConf.get(AM_FINAL_MSG_LIMIT).toInt)
372373
finished = true
373374
if (!inShutdown && Thread.currentThread() != reporterThread && reporterThread != null) {
374375
logDebug("shutting down reporter thread")

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,12 @@ package object config {
192192
.toSequence
193193
.createWithDefault(Nil)
194194

195+
private[spark] val AM_FINAL_MSG_LIMIT = ConfigBuilder("spark.yarn.am.finalMessageLimit")
196+
.doc("The limit size of final diagnostic message for our ApplicationMaster to unregister from" +
197+
" the ResourceManager.")
198+
.bytesConf(ByteUnit.BYTE)
199+
.createWithDefaultString("1m")
200+
195201
/* Client-mode AM configuration. */
196202

197203
private[spark] val AM_CORES = ConfigBuilder("spark.yarn.am.cores")

0 commit comments

Comments
 (0)