Skip to content

Conversation

@Reamer
Copy link
Contributor

@Reamer Reamer commented May 5, 2020

What is this PR for?

With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits

If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime ) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

What type of PR is it?

  • Improvement

What is the Jira issue?

How should this be tested?

Questions:

  • Maybe a higher default memory overhead? Edit: No, because we doesn't need it in the past.
  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Copy link
Member

@Leemoonsoo Leemoonsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

memory: "{{zeppelin.k8s.interpreter.memory}}"
cpu: "{{zeppelin.k8s.interpreter.cores}}"
limits:
cpu: "{{zeppelin.k8s.interpreter.cores}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - How about adding some short comments here about why limits.memory is not configured? maybe a link to this pull request description? So later, other people can see this as a intentional when read the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zjffdu
Copy link
Contributor

zjffdu commented May 7, 2020

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

@Reamer I don't get it , what do you mean ?

@Reamer
Copy link
Contributor Author

Reamer commented May 7, 2020

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

@Reamer I don't get it , what do you mean ?

The spark documentation mentions also YARN clusters for spark.driver.memoryOverhead.

This option is currently supported on YARN and Kubernetes.

I don't have a YARN cluster to execute spark jobs, and I don't even know how YARN really works, but I think that this calculation is also suitable for YARN. If that is not the case, forget my question.

@zjffdu
Copy link
Contributor

zjffdu commented May 7, 2020

@Reamer For yarn, this calculation is done by spark. e.g. if user specify spark.driver.memory as 1g, actually spark will ask for one container of (1g + 384m).
For k8s, I suspect maybe spark already done this kind of calculation.

@Reamer
Copy link
Contributor Author

Reamer commented May 7, 2020

For k8s, I suspect maybe spark already done this kind of calculation.

For technical reasons, Spark can only limit resources in K8s, if you are working in cluster mode. If you run Spark in client mode, as Zeppelin does, you should set --driver-memory (already implemented).

@Reamer
Copy link
Contributor Author

Reamer commented May 11, 2020

I assume that no higher memory overhead is needed, because in the past we did not need that for YARN either.

@Leemoonsoo
Copy link
Member

Thank @Reamer for the improvement! I'm merging this to master and branch-0.9.

@asfgit asfgit closed this in 3b7df78 May 11, 2020
asfgit pushed a commit that referenced this pull request May 11, 2020
### What is this PR for?
With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits
> If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime ) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.

zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

### What type of PR is it?
 - Improvement

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-4799

### How should this be tested?
* **Travis-CI**: https://travis-ci.org/github/Reamer/zeppelin/builds/683269227

### Questions:
* **Maybe a higher default memory overhead?** Edit: No, because we doesn't need it in the past.
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Philipp Dallig <[email protected]>

Closes #3761 from Reamer/spark_k8s_resources and squashes the following commits:

64bd912 [Philipp Dallig] Add short comment for limits.memory
45e565f [Philipp Dallig] Use Spark config values for K8s Interpreter Pod resources
3218309 [Philipp Dallig] Some cleanup

(cherry picked from commit 3b7df78)
Signed-off-by: Lee moon soo <[email protected]>
Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request May 11, 2020
Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request May 11, 2020
@Reamer Reamer deleted the spark_k8s_resources branch May 12, 2020 06:16
Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Jun 2, 2020
Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Jun 30, 2020
Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Aug 13, 2020
Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Aug 13, 2020
Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Aug 19, 2020
Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants