-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable #35504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
16df0d7
2514140
fbfee7b
d241a9f
f65d0e4
740e970
9e36055
4f8e7e6
bcb764c
0bf3a2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1137,15 +1137,6 @@ See the [configuration page](configuration.html) for information on Spark config | |
| </td> | ||
| <td>3.0.0</td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.kubernetes.memoryOverheadFactor</code></td> | ||
| <td><code>0.1</code></td> | ||
| <td> | ||
| This sets the Memory Overhead Factor that will allocate memory to non-JVM memory, which includes off-heap memory allocations, non-JVM tasks, various systems processes, and <code>tmpfs</code>-based local directories when <code>spark.kubernetes.local.dirs.tmpfs</code> is <code>true</code>. For JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs. | ||
| This is done as non-JVM tasks need more non-JVM heap space and such tasks commonly fail with "Memory Overhead Exceeded" errors. This preempts this error with a higher default. | ||
| </td> | ||
| <td>2.4.0</td> | ||
| </tr> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is (3). We should not remove a documentation during deprecation stage. |
||
| <tr> | ||
| <td><code>spark.kubernetes.pyspark.pythonVersion</code></td> | ||
| <td><code>"3"</code></td> | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -59,11 +59,16 @@ private[spark] class BasicExecutorFeatureStep( | |||||||
| private val isDefaultProfile = resourceProfile.id == ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID | ||||||||
| private val isPythonApp = kubernetesConf.get(APP_RESOURCE_TYPE) == Some(APP_RESOURCE_TYPE_PYTHON) | ||||||||
| private val disableConfigMap = kubernetesConf.get(KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP) | ||||||||
| private val memoryOverheadFactor = if (kubernetesConf.contains(EXECUTOR_MEMORY_OVERHEAD_FACTOR)) { | ||||||||
| kubernetesConf.get(EXECUTOR_MEMORY_OVERHEAD_FACTOR) | ||||||||
| } else { | ||||||||
| kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) | ||||||||
| } | ||||||||
|
Comment on lines
+62
to
+66
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason should be in here, before kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) was used as default factor, it's But current EXECUTOR_MEMORY_OVERHEAD_FACTOR has more priority than so MEMORY_OVERHEAD_FACTOR is be overrited. (so 0.1 by default). So that the default behavior changed. But I haven't found why
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lines 62 to 63 in cd86df8
I found it, it is propagated to executors from driver
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah thanks this is what I was also just looking at but I'm not sure how it was propagated to the executors. I was looking at through the KubernetesDriverconf somehow or possible through the pod system properties: If you find it let me know, still investigating.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I think I see how this is happening: We build the driver spec, which includes the added system properties:
Added system properties in driver feature steps add the memory overhead setting there: Then the KubernetesClientUtils.buildSparkConfDirFilesMap is called which propagates it to the executors (I think)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, I think so. Line 72 in cd86df8
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thank you. Then, it's simpler because this PR was backported manually after feature freeze. :)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please unblock Apache Spark 3.3 K8s module QA period by reverting this. We can land it back after having a healthy commit.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay I think I get it, those "system properties" end up as default spark configs on the executor. Clear as mud
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #35900 revert pr
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for your decision, @tgravescs . |
||||||||
|
|
||||||||
|
||||||||
| val execResources = ResourceProfile.getResourcesForClusterManager( | ||||||||
| resourceProfile.id, | ||||||||
| resourceProfile.executorResources, | ||||||||
| kubernetesConf.get(MEMORY_OVERHEAD_FACTOR), | ||||||||
| memoryOverheadFactor, | ||||||||
| kubernetesConf.sparkConf, | ||||||||
| isPythonApp, | ||||||||
| Map.empty) | ||||||||
|
|
||||||||
Uh oh!
There was an error while loading. Please reload this page.