[ZEPPELIN-4799] Use spark resource configuration #3761

Reamer · 2020-05-05T09:52:18Z

What is this PR for?

With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits

If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime ) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

What type of PR is it?

Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-4799

How should this be tested?

Travis-CI: https://travis-ci.org/github/Reamer/zeppelin/builds/683269227

Questions:

Maybe a higher default memory overhead? Edit: No, because we doesn't need it in the past.
Does the licenses files need update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

Leemoonsoo

LGTM

Leemoonsoo · 2020-05-05T14:52:15Z

k8s/interpreter/100-interpreter-spec.yaml

+        memory: "{{zeppelin.k8s.interpreter.memory}}"
+        cpu: "{{zeppelin.k8s.interpreter.cores}}"
+      limits:
+        cpu: "{{zeppelin.k8s.interpreter.cores}}"


nit - How about adding some short comments here about why limits.memory is not configured? maybe a link to this pull request description? So later, other people can see this as a intentional when read the code.

zjffdu · 2020-05-07T07:21:13Z

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

@Reamer I don't get it , what do you mean ?

Reamer · 2020-05-07T08:07:09Z

@zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class.

@Reamer I don't get it , what do you mean ?

The spark documentation mentions also YARN clusters for spark.driver.memoryOverhead.

This option is currently supported on YARN and Kubernetes.

I don't have a YARN cluster to execute spark jobs, and I don't even know how YARN really works, but I think that this calculation is also suitable for YARN. If that is not the case, forget my question.

zjffdu · 2020-05-07T08:18:40Z

@Reamer For yarn, this calculation is done by spark. e.g. if user specify spark.driver.memory as 1g, actually spark will ask for one container of (1g + 384m).
For k8s, I suspect maybe spark already done this kind of calculation.

Reamer · 2020-05-07T09:11:55Z

For k8s, I suspect maybe spark already done this kind of calculation.

For technical reasons, Spark can only limit resources in K8s, if you are working in cluster mode. If you run Spark in client mode, as Zeppelin does, you should set --driver-memory (already implemented).

Reamer · 2020-05-11T06:03:11Z

I assume that no higher memory overhead is needed, because in the past we did not need that for YARN either.

Leemoonsoo · 2020-05-11T14:46:56Z

Thank @Reamer for the improvement! I'm merging this to master and branch-0.9.

### What is this PR for? With this PR, we use spark configuration values for K8s Pod resources. A memory limit is not set because of a potential OOM-Killer. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits > If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime ) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error. zjffdu Are using a YARN cluster to schedule your Interpreters? Maybe we should change the location of the calculation class. ### What type of PR is it? - Improvement ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-4799 ### How should this be tested? * **Travis-CI**: https://travis-ci.org/github/Reamer/zeppelin/builds/683269227 ### Questions: * **Maybe a higher default memory overhead?** Edit: No, because we doesn't need it in the past. * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Philipp Dallig <[email protected]> Closes #3761 from Reamer/spark_k8s_resources and squashes the following commits: 64bd912 [Philipp Dallig] Add short comment for limits.memory 45e565f [Philipp Dallig] Use Spark config values for K8s Interpreter Pod resources 3218309 [Philipp Dallig] Some cleanup (cherry picked from commit 3b7df78) Signed-off-by: Lee moon soo <[email protected]>

Reamer added 2 commits May 5, 2020 08:53

Some cleanup

3218309

Use Spark config values for K8s Interpreter Pod resources

45e565f

Leemoonsoo approved these changes May 6, 2020

View reviewed changes

Add short comment for limits.memory

64bd912

asfgit closed this in 3b7df78 May 11, 2020

Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request May 11, 2020

spark driver pod resource configuration based on spark conf. apache#3761

8e47c21

Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request May 11, 2020

spark driver pod resource configuration based on spark conf. apache#3761

fcbe2d7

Reamer deleted the spark_k8s_resources branch May 12, 2020 06:16

Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Jun 2, 2020

spark driver pod resource configuration based on spark conf. apache#3761

6566714

Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Jun 30, 2020

spark driver pod resource configuration based on spark conf. apache#3761

d2a7646

Leemoonsoo added a commit to Leemoonsoo/zeppelin that referenced this pull request Aug 13, 2020

spark driver pod resource configuration based on spark conf. apache#3761

eda8d4d

Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Aug 13, 2020

spark driver pod resource configuration based on spark conf. apache#3761

f291513

Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Aug 19, 2020

spark driver pod resource configuration based on spark conf. apache#3761

3b0a212

Leemoonsoo added a commit to open-datastudio/zeppelin that referenced this pull request Nov 20, 2020

spark driver pod resource configuration based on spark conf. apache#3761

0efa565

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ZEPPELIN-4799] Use spark resource configuration #3761

[ZEPPELIN-4799] Use spark resource configuration #3761

Uh oh!

Reamer commented May 5, 2020 •

edited

Loading

Uh oh!

Leemoonsoo left a comment

Uh oh!

Leemoonsoo May 5, 2020

Uh oh!

Reamer May 7, 2020

Uh oh!

zjffdu commented May 7, 2020

Uh oh!

Reamer commented May 7, 2020

Uh oh!

zjffdu commented May 7, 2020

Uh oh!

Reamer commented May 7, 2020

Uh oh!

Reamer commented May 11, 2020 •

edited

Loading

Uh oh!

Leemoonsoo commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ZEPPELIN-4799] Use spark resource configuration #3761

[ZEPPELIN-4799] Use spark resource configuration #3761

Uh oh!

Conversation

Reamer commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

What is the Jira issue?

How should this be tested?

Questions:

Uh oh!

Leemoonsoo left a comment

Choose a reason for hiding this comment

Uh oh!

Leemoonsoo May 5, 2020

Choose a reason for hiding this comment

Uh oh!

Reamer May 7, 2020

Choose a reason for hiding this comment

Uh oh!

zjffdu commented May 7, 2020

Uh oh!

Reamer commented May 7, 2020

Uh oh!

zjffdu commented May 7, 2020

Uh oh!

Reamer commented May 7, 2020

Uh oh!

Reamer commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reamer commented May 5, 2020 •

edited

Loading

Reamer commented May 11, 2020 •

edited

Loading