-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. #24671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #105660 has finished for PR 24671 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).
Rather, I think the distinction is that these non-heap allocations don't count towards the JVM's total heap size limit, a factor which needs to be taken into account for container memory sizing: if you request a Spark executor with, say, a 4 gigabyte heap, then the actual peak memory usage of the process (from the OS's perspective) is going to be more than 4 gigabytes due to these types of off-heap allocations.
If we want to avoid container OOM kills (by YARN or Kubernetes) then we need to account for this extra overhead somewhere, hence these *memoryOverhead "fudge-factors": setting the memory overhead causes us to request a container whose total memory size is greater than the heap size.
That said, increasing the memory overhead does also result in additional memory headroom that can be used by non-driver/executor processes (like overhead from other processes in the container). Maybe we could state this explicitly, e.g. something like "... non-heap memory, including off-heap memory (e.g ....) and memory used by other non-driver / executor processes running in the same container".
You're correct that the relationship between these configurations and the Tungsten off-heap configuration is a bit confusing. To address that confusion, I'd prefer to expand the documentation to explicitly mention these *memoryOverhead configurations in the documentation for the Tungsten off-heap setting: I think that doc should recommend raising the memoryOverhead when setting the Tungsten config.
It might also help to more explicitly clarify that these settings only make sense in a containerized / resource limited deployment mode (e.g. not in standalone mode).
In summary, I think there's definitely room for confusion with the existing fix, but I think the right solution is an expansion of the docs to much more explicitly clarify the relationship between both sets of configurations, not a minor re-word.
|
Test build #105674 has finished for PR 24671 at commit
|
Thanks for your review. Yes, I ignored the detail off-heap memory is still in the JVM's own process address space. |
|
Test build #105676 has finished for PR 24671 at commit
|
|
Test build #105677 has finished for PR 24671 at commit
|
|
Test build #105679 has finished for PR 24671 at commit
|
|
Retest this please. |
|
Test build #105682 has finished for PR 24671 at commit
|
|
Test build #105686 has finished for PR 24671 at commit
|
|
Retest this please. |
|
Test build #105710 has finished for PR 24671 at commit
|
beliefer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review again!
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
|
Retest this please. |
|
Test build #105812 has finished for PR 24671 at commit
|
|
Test build #105814 has finished for PR 24671 at commit
|
|
Test build #105854 has finished for PR 24671 at commit
|
|
Test build #105886 has finished for PR 24671 at commit
|
|
Test build #105939 has finished for PR 24671 at commit
|
|
Test build #105950 has finished for PR 24671 at commit
|
|
@JoshRosen Could you review this PR again and find other issues? |
|
Merged to master |
|
@srowen Thanks for your merger. I thought that two requested change need every one to agree. |
What changes were proposed in this pull request?
I found the docs of
spark.driver.memoryOverheadandspark.executor.memoryOverheadexists a little ambiguity.For example, the origin docs of
spark.driver.memoryOverheadstart withThe amount of off-heap memory to be allocated per driver in cluster mode.But
MemoryManageralso managed a memory area named off-heap used to allocate memory in tungsten mode.So I think the description of
spark.driver.memoryOverheadalways make confused.spark.executor.memoryOverheadhas the same confused withspark.driver.memoryOverhead.How was this patch tested?
Exists UT.