Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

I found the docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists a little ambiguity.
For example, the origin docs of spark.driver.memoryOverhead start with The amount of off-heap memory to be allocated per driver in cluster mode.
But MemoryManager also managed a memory area named off-heap used to allocate memory in tungsten mode.
So I think the description of spark.driver.memoryOverhead always make confused.

spark.executor.memoryOverhead has the same confused with spark.driver.memoryOverhead.

How was this patch tested?

Exists UT.

@beliefer beliefer changed the title [MINOR][DOCS]Improve docs about overhead. [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. May 22, 2019
@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105660 has finished for PR 24671 at commit ef6c505.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).

Rather, I think the distinction is that these non-heap allocations don't count towards the JVM's total heap size limit, a factor which needs to be taken into account for container memory sizing: if you request a Spark executor with, say, a 4 gigabyte heap, then the actual peak memory usage of the process (from the OS's perspective) is going to be more than 4 gigabytes due to these types of off-heap allocations.

If we want to avoid container OOM kills (by YARN or Kubernetes) then we need to account for this extra overhead somewhere, hence these *memoryOverhead "fudge-factors": setting the memory overhead causes us to request a container whose total memory size is greater than the heap size.

That said, increasing the memory overhead does also result in additional memory headroom that can be used by non-driver/executor processes (like overhead from other processes in the container). Maybe we could state this explicitly, e.g. something like "... non-heap memory, including off-heap memory (e.g ....) and memory used by other non-driver / executor processes running in the same container".

You're correct that the relationship between these configurations and the Tungsten off-heap configuration is a bit confusing. To address that confusion, I'd prefer to expand the documentation to explicitly mention these *memoryOverhead configurations in the documentation for the Tungsten off-heap setting: I think that doc should recommend raising the memoryOverhead when setting the Tungsten config.

It might also help to more explicitly clarify that these settings only make sense in a containerized / resource limited deployment mode (e.g. not in standalone mode).

In summary, I think there's definitely room for confusion with the existing fix, but I think the right solution is an expansion of the docs to much more explicitly clarify the relationship between both sets of configurations, not a minor re-word.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105674 has finished for PR 24671 at commit d018b35.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).

Thanks for your review. Yes, I ignored the detail off-heap memory is still in the JVM's own process address space.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105676 has finished for PR 24671 at commit 554caa2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105677 has finished for PR 24671 at commit 2fdc94f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105679 has finished for PR 24671 at commit 2c89b42.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105682 has finished for PR 24671 at commit 3f79e89.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2019

Test build #105686 has finished for PR 24671 at commit 3f79e89.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

Retest this please.

@beliefer beliefer changed the title [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. May 23, 2019
@SparkQA
Copy link

SparkQA commented May 23, 2019

Test build #105710 has finished for PR 24671 at commit 3f79e89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review again!

@beliefer
Copy link
Contributor Author

Retest this please.

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105812 has finished for PR 24671 at commit c4ab8a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105814 has finished for PR 24671 at commit f96f936.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 28, 2019

Test build #105854 has finished for PR 24671 at commit 1f31fc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 29, 2019

Test build #105886 has finished for PR 24671 at commit f23c1b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105939 has finished for PR 24671 at commit 5dbbf4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105950 has finished for PR 24671 at commit 7980652.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

@JoshRosen Could you review this PR again and find other issues?

@srowen
Copy link
Member

srowen commented Jun 1, 2019

Merged to master

@srowen srowen closed this in 8feb80a Jun 1, 2019
@beliefer
Copy link
Contributor Author

beliefer commented Jun 1, 2019

@srowen Thanks for your merger. I thought that two requested change need every one to agree.
@JoshRosen Thanks for your detailed review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants