Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 2, 2022

What changes were proposed in this pull request?

This PR aims to avoid the deprecation of spark.kubernetes.memoryOverheadFactor from Apache Spark 3.3. In addition, also recovers the documentation which is removed mistakenly at the deprecation. Deprecation is not a removal.

Why are the changes needed?

  • Apache Spark 3.3.0 RC complains always about spark.kubernetes.memoryOverheadFactor because the configuration has the default value (which is not given by the users). There is no way to remove the warnings which means the directional message is not helpful and makes the users confused in a wrong way. In other words, we still get warnings even we use only new configurations or no configuration.
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/01 23:53:50 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
  • The minimum constraint is slightly different because spark.kubernetes.memoryOverheadFactor allowed 0 since Apache Spark 2.4 while new configurations disallow 0.

  • This documentation removal might be too early because the deprecation is not the removal of configuration. This PR recoveres the removed doc and added the following.

This will be overridden by the value set by
<code>spark.driver.memoryOverheadFactor</code> and
<code>spark.executor.memoryOverheadFactor</code> explicitly.

Does this PR introduce any user-facing change?

No. This is a consistent with the existing behavior.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39360][K8S] Remove deprecation of spark.kubernetes.memoryOverheadFactor and recover doc [SPARK-39360][K8S] Remove deprecation of spark.kubernetes.memoryOverheadFactor and recover doc Jun 2, 2022
<td>
This sets the Memory Overhead Factor that will allocate memory to non-JVM memory, which includes off-heap memory allocations, non-JVM tasks, various systems processes, and <code>tmpfs</code>-based local directories when <code>spark.kubernetes.local.dirs.tmpfs</code> is <code>true</code>. For JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
This is done as non-JVM tasks need more non-JVM heap space and such tasks commonly fail with "Memory Overhead Exceeded" errors. This preempts this error with a higher default.
This will be overridden by the value set by <code>spark.driver.memoryOverheadFactor</code> and <code>spark.executor.memoryOverheadFactor</code> explicitly.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added newly.

@tgravescs
Copy link
Contributor

This looks fine to me. If the configuration is set by default we need follow up issue I can look more tomorrow

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @tgravescs . Yes, I guess we can do clean deprecation during Apache Spark 3.4 timeframe. For Spark 3.3.0, it will be enough to deliver new generalized configurations first.

@dongjoon-hyun
Copy link
Member Author

Thank you, @tgravescs and @huaxingao . Merged to master/3.3.

dongjoon-hyun added a commit that referenced this pull request Jun 2, 2022
…headFactor` and recover doc

### What changes were proposed in this pull request?

This PR aims to avoid the deprecation of `spark.kubernetes.memoryOverheadFactor` from Apache Spark 3.3. In addition, also recovers the documentation which is removed mistakenly at the `deprecation`. `Deprecation` is not a removal.

### Why are the changes needed?

- Apache Spark 3.3.0 RC complains always about `spark.kubernetes.memoryOverheadFactor` because the configuration has the default value (which is not given by the users). There is no way to remove the warnings which means the directional message is not helpful and makes the users confused in a wrong way. In other words, we still get warnings even we use only new configurations or no configuration.
```
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/01 23:53:50 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
```

- The minimum constraint is slightly different because `spark.kubernetes.memoryOverheadFactor` allowed 0 since Apache Spark 2.4 while new configurations disallow `0`.

- This documentation removal might be too early because the deprecation is not the removal of configuration. This PR recoveres the removed doc and added the following.
```
This will be overridden by the value set by
<code>spark.driver.memoryOverheadFactor</code> and
<code>spark.executor.memoryOverheadFactor</code> explicitly.
```

### Does this PR introduce _any_ user-facing change?

No. This is a consistent with the existing behavior.

### How was this patch tested?

Pass the CIs.

Closes #36744 from dongjoon-hyun/SPARK-39360.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6d43556)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-39360 branch June 2, 2022 03:10
@tgravescs
Copy link
Contributor

filed https://issues.apache.org/jira/browse/SPARK-39363 as followup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants