-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-4799] Use spark resource configuration #3761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Leemoonsoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| memory: "{{zeppelin.k8s.interpreter.memory}}" | ||
| cpu: "{{zeppelin.k8s.interpreter.cores}}" | ||
| limits: | ||
| cpu: "{{zeppelin.k8s.interpreter.cores}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - How about adding some short comments here about why limits.memory is not configured? maybe a link to this pull request description? So later, other people can see this as a intentional when read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The spark documentation mentions also YARN clusters for
I don't have a YARN cluster to execute spark jobs, and I don't even know how YARN really works, but I think that this calculation is also suitable for YARN. If that is not the case, forget my question. |
|
@Reamer For yarn, this calculation is done by spark. e.g. if user specify spark.driver.memory as 1g, actually spark will ask for one container of (1g + 384m). |
For technical reasons, Spark can only limit resources in K8s, if you are working in |
|
I assume that no higher memory overhead is needed, because in the past we did not need that for YARN either. |
|
Thank @Reamer for the improvement! I'm merging this to master and branch-0.9. |
### What is this PR for? With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits > If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime ) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error. zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class. ### What type of PR is it? - Improvement ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-4799 ### How should this be tested? * **Travis-CI**: https://travis-ci.org/github/Reamer/zeppelin/builds/683269227 ### Questions: * **Maybe a higher default memory overhead?** Edit: No, because we doesn't need it in the past. * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Philipp Dallig <[email protected]> Closes #3761 from Reamer/spark_k8s_resources and squashes the following commits: 64bd912 [Philipp Dallig] Add short comment for limits.memory 45e565f [Philipp Dallig] Use Spark config values for K8s Interpreter Pod resources 3218309 [Philipp Dallig] Some cleanup (cherry picked from commit 3b7df78) Signed-off-by: Lee moon soo <[email protected]>
What is this PR for?
With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer.
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits
@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.
What type of PR is it?
What is the Jira issue?
How should this be tested?
Questions: