Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -245,11 +245,11 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
error("Must specify a primary resource (JAR or Python or R file)")
}
if (driverMemory != null
&& Try(JavaUtils.byteStringAsBytes(driverMemory)).getOrElse(-1L) <= 0) {
&& Try(JavaUtils.byteStringAsMb(driverMemory)).getOrElse(-1L) <= 0) {
error("Driver memory must be a positive number")
}
if (executorMemory != null
&& Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) {
&& Try(JavaUtils.byteStringAsMb(executorMemory)).getOrElse(-1L) <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change executorMemory, do we need to update line 248 for driverMemory together maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to change this line? This seems to ensure that the value is non-negative. Just for my understanding, could you give me some example which gives different result before and after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, there is no difference regarding the behaviour but reading the code and seeing byteStringAsBytes called with these configs gives the false impression they are in bytes. I think it is worth to change them to byteStringAsMb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)

error("Executor memory must be a positive number")
}
if (executorCores != null && Try(executorCores.toInt).getOrElse(-1) <= 0) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ object UnifiedMemoryManager {
}
// SPARK-12759 Check executor memory to fail fast if memory is insufficient
if (conf.contains(config.EXECUTOR_MEMORY)) {
val executorMemory = conf.getSizeAsBytes(config.EXECUTOR_MEMORY.key)
val executorMemory = conf.getSizeAsMb(config.EXECUTOR_MEMORY.key)
if (executorMemory < minSystemMemory) {
throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
s"$minSystemMemory. Please increase executor memory using the " +
Expand Down
5 changes: 3 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ of the most common options to set are:
<td>
Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in the
same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t")
(e.g. <code>512m</code>, <code>2g</code>).
(e.g. <code>512m</code>, <code>2g</code>) using "m" as the default unit.
<br />
<em>Note:</em> In client mode, this config must not be set through the <code>SparkConf</code>
directly in your application, because the driver JVM has already started at that point.
Expand Down Expand Up @@ -249,7 +249,8 @@ of the most common options to set are:
<td>1g</td>
<td>
Amount of memory to use per executor process, in the same format as JVM memory strings with
a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>).
a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>) using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we are a bit inconsistent across the documentation as wel (pyspark.memory, memoryOverhead)l. other memory settings just say MiB unless otherwise specified but don't mention the suffix options. I wonder if we make them all consistent. Note one of the yarn configs says: Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. but again doesn't say m is the default.

"m" as the default unit.
</td>
<td>0.7.0</td>
</tr>
Expand Down
6 changes: 5 additions & 1 deletion docs/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,11 @@ The history server can be configured as follows:
<tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr>
<tr>
<td><code>SPARK_DAEMON_MEMORY</code></td>
<td>Memory to allocate to the history server (default: 1g).</td>
<td>
Memory to allocate to the history server (default: 1g). This can be configured in the same
format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") using "m" as
the default unit.
</td>
</tr>
<tr>
<td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
Expand Down
10 changes: 8 additions & 2 deletions docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,10 @@ You can optionally configure the cluster further by setting environment variable
</tr>
<tr>
<td><code>SPARK_WORKER_MEMORY</code></td>
<td>Total amount of memory to allow Spark applications to use on the machine, e.g. <code>1000m</code>, <code>2g</code> (default: total memory minus 1 GiB); note that each application's <i>individual</i> memory is configured using its <code>spark.executor.memory</code> property.</td>
<td>
Total amount of memory to allow Spark applications to use on the machine, e.g. <code>1000m</code>, <code>2g</code> (default: total memory minus 1 GiB); note that each application's <i>individual</i> memory is configured using its <code>spark.executor.memory</code> property.
This can be configured in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") using "m" as the default unit.
</td>
</tr>
<tr>
<td><code>SPARK_WORKER_PORT</code></td>
Expand All @@ -164,7 +167,10 @@ You can optionally configure the cluster further by setting environment variable
</tr>
<tr>
<td><code>SPARK_DAEMON_MEMORY</code></td>
<td>Memory to allocate to the Spark master and worker daemons themselves (default: 1g).</td>
<td>
Memory to allocate to the Spark master and worker daemons themselves (default: 1g). This can be configured in the same
format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") using "m" as the default unit.
</td>
</tr>
<tr>
<td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -328,4 +328,17 @@ static String findJarsDir(String sparkHome, String scalaVersion, boolean failIfN
return libdir.getAbsolutePath();
}

/**
* Add "m" as the default suffix unit when no explicit unit is given.
*/
static String addDefaultMSuffixIfNeeded(String memoryString) {
if (memoryString.chars().allMatch(Character::isDigit)) {
System.err.println("Memory setting without explicit unit (" +
memoryString + ") is taken to be in MB by default! For details check SPARK-32293.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are documenting 'm' is the suffix we use if not specified, do we need this message to stderr ?

return memoryString + "m";
} else {
return memoryString;
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ public List<String> buildCommand(Map<String, String> env)
}

String mem = firstNonEmpty(memKey != null ? System.getenv(memKey) : null, DEFAULT_MEM);
cmd.add("-Xmx" + mem);
cmd.add("-Xmx" + addDefaultMSuffixIfNeeded(mem));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the standalone docs as well for --memory (SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY) to say default to m and ideally make consistent with above docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we should test those as well if you haven't already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I really focused on XMX and XMS settings but now I see there another error at

val executorMemory = conf.getSizeAsBytes(config.EXECUTOR_MEMORY.key)

cmd.add(className);
cmd.addAll(classArgs);
return cmd;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ private List<String> buildSparkSubmitCommand(Map<String, String> env)
isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null;
String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY),
System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM);
cmd.add("-Xmx" + memory);
cmd.add("-Xmx" + addDefaultMSuffixIfNeeded(memory));
addOptionString(cmd, driverDefaultJavaOptions);
addOptionString(cmd, driverExtraJavaOptions);
mergeEnvPathList(env, getLibPathEnvName(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,22 @@ public void testCliParser() throws Exception {
Collections.indexOfSubList(cmd, Arrays.asList(parser.CONF, "spark.randomOption=foo")) > 0);
}

@Test
public void testParserWithDefaultUnit() throws Exception {
List<String> sparkSubmitArgs = Arrays.asList(
parser.MASTER,
"local",
parser.DRIVER_MEMORY,
"4200",
parser.DRIVER_CLASS_PATH,
"/driverCp",
SparkLauncher.NO_RESOURCE);
Map<String, String> env = new HashMap<>();
List<String> cmd = buildCommand(sparkSubmitArgs, env);

assertTrue("Driver -Xmx should be configured in MB by default.", cmd.contains("-Xmx4200m"));
}

@Test
public void testShellCliParser() throws Exception {
List<String> sparkSubmitArgs = Arrays.asList(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,16 @@ case "$1" in
;;
executor)
shift 1
MEMORY_WITH_UNIT=$SPARK_EXECUTOR_MEMORY
if [[ $MEMORY_WITH_UNIT =~ ^[0-9]+$ ]]
then
MEMORY_WITH_UNIT="${MEMORY_WITH_UNIT}m"
fi
CMD=(
${JAVA_HOME}/bin/java
"${SPARK_EXECUTOR_JAVA_OPTS[@]}"
-Xms$SPARK_EXECUTOR_MEMORY
-Xmx$SPARK_EXECUTOR_MEMORY
-Xms$MEMORY_WITH_UNIT
-Xmx$MEMORY_WITH_UNIT
-cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
org.apache.spark.executor.CoarseGrainedExecutorBackend
--driver-url $SPARK_DRIVER_URL
Expand Down