-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32293] Fix inconsistency between Spark memory configs and JVM option #29090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
dfbce91
2ddbe0c
cc495c1
76a297d
8dfe643
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -172,7 +172,7 @@ of the most common options to set are: | |
| <td> | ||
| Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in the | ||
| same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") | ||
| (e.g. <code>512m</code>, <code>2g</code>). | ||
| (e.g. <code>512m</code>, <code>2g</code>) using "m" as the default unit. | ||
| <br /> | ||
| <em>Note:</em> In client mode, this config must not be set through the <code>SparkConf</code> | ||
| directly in your application, because the driver JVM has already started at that point. | ||
|
|
@@ -249,7 +249,8 @@ of the most common options to set are: | |
| <td>1g</td> | ||
| <td> | ||
| Amount of memory to use per executor process, in the same format as JVM memory strings with | ||
| a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>). | ||
| a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>) using | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems we are a bit inconsistent across the documentation as wel (pyspark.memory, memoryOverhead)l. other memory settings just say MiB unless otherwise specified but don't mention the suffix options. I wonder if we make them all consistent. Note one of the yarn configs says: Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. but again doesn't say m is the default. |
||
| "m" as the default unit. | ||
| </td> | ||
| <td>0.7.0</td> | ||
| </tr> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -328,4 +328,17 @@ static String findJarsDir(String sparkHome, String scalaVersion, boolean failIfN | |
| return libdir.getAbsolutePath(); | ||
| } | ||
|
|
||
| /** | ||
| * Add "m" as the default suffix unit when no explicit unit is given. | ||
| */ | ||
| static String addDefaultMSuffixIfNeeded(String memoryString) { | ||
| if (memoryString.chars().allMatch(Character::isDigit)) { | ||
| System.err.println("Memory setting without explicit unit (" + | ||
| memoryString + ") is taken to be in MB by default! For details check SPARK-32293."); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that we are documenting 'm' is the suffix we use if not specified, do we need this message to stderr ? |
||
| return memoryString + "m"; | ||
| } else { | ||
| return memoryString; | ||
| } | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -108,7 +108,7 @@ public List<String> buildCommand(Map<String, String> env) | |||
| } | ||||
|
|
||||
| String mem = firstNonEmpty(memKey != null ? System.getenv(memKey) : null, DEFAULT_MEM); | ||||
| cmd.add("-Xmx" + mem); | ||||
| cmd.add("-Xmx" + addDefaultMSuffixIfNeeded(mem)); | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should update the standalone docs as well for --memory (SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY) to say default to m and ideally make consistent with above docs.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note we should test those as well if you haven't already
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks I really focused on XMX and XMS settings but now I see there another error at
|
||||
| cmd.add(className); | ||||
| cmd.addAll(classArgs); | ||||
| return cmd; | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this change
executorMemory, do we need to update line 248 fordriverMemorytogether maybe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we need to change this line? This seems to ensure that the value is non-negative. Just for my understanding, could you give me some example which gives different result before and after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, there is no difference regarding the behaviour but reading the code and seeing
byteStringAsBytescalled with these configs gives the false impression they are in bytes. I think it is worth to change them tobyteStringAsMb.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)