|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: "Apache Spark Interpreter for Apache Zeppelin on Kubernetes" |
| 4 | +description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine. This interpreter runs on the https://github.com/apache-spark-on-k8s/spark version of Spark" |
| 5 | +group: interpreter |
| 6 | +--- |
| 7 | +<!-- |
| 8 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 9 | +you may not use this file except in compliance with the License. |
| 10 | +You may obtain a copy of the License at |
| 11 | +
|
| 12 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 13 | +
|
| 14 | +Unless required by applicable law or agreed to in writing, software |
| 15 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 16 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 17 | +See the License for the specific language governing permissions and |
| 18 | +limitations under the License. |
| 19 | +--> |
| 20 | +{% include JB/setup %} |
| 21 | + |
| 22 | +# How to run Zeppelin Spark notebooks on a Kubernetes cluster |
| 23 | + |
| 24 | +<div id="toc"></div> |
| 25 | + |
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +The following tools are required: |
| 29 | + |
| 30 | + - Kubernetes cluster & kubectl |
| 31 | + |
| 32 | + For local testing Minikube can be used to create a single node cluster: https://kubernetes.io/docs/tasks/tools/install-minikube/ |
| 33 | + |
| 34 | + - Docker https://kubernetes.io/docs/tasks/tools/install-minikube/ |
| 35 | + |
| 36 | + This documentation uses a pre-built Spark 2.2 Docker images, however you may also build these images as described here: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/README.md |
| 37 | + |
| 38 | +## Checkout Zeppelin source code |
| 39 | + |
| 40 | +Checkout the latest source code from https://github.com/apache/zeppelin then apply changes from the [Add support to run Spark interpreter on a Kubernetes cluster](https://github.com/apache/zeppelin/pull/2637) pull request. |
| 41 | + |
| 42 | +## Build Zeppelin |
| 43 | +- `./dev/change_scala_version.sh 2.11` |
| 44 | +- `mvn clean install -DskipTests -Pspark-2.2 -Phadoop-2.4 -Pyarn -Ppyspark -Pscala-2.11` |
| 45 | + |
| 46 | + |
| 47 | +## Create distribution |
| 48 | +- `cd zeppelin-distribution` |
| 49 | +- `mvn org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single -P apache-release` |
| 50 | + |
| 51 | +## Create Zeppelin Dockerfile in Zeppelin distribution target folder |
| 52 | +``` |
| 53 | +cd {zeppelin_source}/zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT |
| 54 | +cat > Dockerfile <<EOF |
| 55 | +FROM kubespark/spark-base:v2.2.0-kubernetes-0.5.0 |
| 56 | +COPY zeppelin-0.8.0-SNAPSHOT /opt/zeppelin |
| 57 | +ADD https://storage.googleapis.com/kubernetes-release/release/v1.7.4/bin/linux/amd64/kubectl /usr/local/bin |
| 58 | +WORKDIR /opt/zeppelin |
| 59 | +ENTRYPOINT bin/zeppelin.sh |
| 60 | +EOF |
| 61 | +``` |
| 62 | + |
| 63 | +## Create / Start a Kubernetes cluster |
| 64 | +In case of using Minikube on Linux with KVM: |
| 65 | + |
| 66 | +`minikube start --vm-driver=kvm --cpus={nr_of_cpus} --memory={mem}` |
| 67 | + |
| 68 | +You can check the Kubernetes dashboard address by running: `minikube dashboard`. |
| 69 | + |
| 70 | +Init docker env: `eval $(minikube docker-env)` |
| 71 | + |
| 72 | +## Build & tag Docker image |
| 73 | + |
| 74 | +``` |
| 75 | +docker build -t zeppelin-server:v2.2.0-kubernetes -f Dockerfile . |
| 76 | +``` |
| 77 | + |
| 78 | +You can retrieve the `imageid` by running docker images` |
| 79 | + |
| 80 | +## Start ResourceStagingServer for spark-submit |
| 81 | + |
| 82 | +Spark-submit will use ResourceStagingServer to distribute resources (in our case the Zeppelin Spark interpreter JAR) across Spark driver and executors. |
| 83 | + |
| 84 | +``` |
| 85 | +wget https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/conf/kubernetes-resource-staging-server.yaml |
| 86 | +kubectl create -f kubernetes-resource-staging-server.yaml |
| 87 | +``` |
| 88 | + |
| 89 | +## Create a Kubernetes service to reach Zeppelin server from outside the cluster |
| 90 | + |
| 91 | +``` |
| 92 | +cat > zeppelin-service.yaml <<EOF |
| 93 | +apiVersion: v1 |
| 94 | +kind: Service |
| 95 | +metadata: |
| 96 | + name: zeppelin-k8-service |
| 97 | + labels: |
| 98 | + app: zeppelin-server |
| 99 | +spec: |
| 100 | + ports: |
| 101 | + - port: 8080 |
| 102 | + targetPort: 8080 |
| 103 | + selector: |
| 104 | + app: zeppelin-server |
| 105 | + type: NodePort |
| 106 | +EOF |
| 107 | +
|
| 108 | +kubectl create -f zeppelin-service.yaml |
| 109 | +
|
| 110 | +``` |
| 111 | + |
| 112 | +## Start Zeppelin server |
| 113 | + |
| 114 | +``` |
| 115 | +cat > zeppelin-pod-local.yaml <<EOF |
| 116 | +apiVersion: v1 |
| 117 | +kind: Pod |
| 118 | +metadata: |
| 119 | + name: zeppelin-server |
| 120 | + labels: |
| 121 | + app: zeppelin-server |
| 122 | +spec: |
| 123 | + containers: |
| 124 | + - name: zeppelin-server |
| 125 | + image: zeppelin-server:v2.2.0-kubernetes |
| 126 | + env: |
| 127 | + - name: SPARK_SUBMIT_OPTIONS |
| 128 | + value: --kubernetes-namespace default |
| 129 | +--conf spark.executor.instances=1 |
| 130 | +--conf spark.kubernetes.resourceStagingServer.uri=http://{RESOURCE_STAGING_SERVER_ADDRESS}:10000 |
| 131 | +--conf spark.kubernetes.resourceStagingServer.internal.uri=http://{RESOURCE_STAGING_SERVER_ADDRESS}:10000 |
| 132 | +--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 |
| 133 | + ports: |
| 134 | + - containerPort: 8080 |
| 135 | +EOF |
| 136 | +``` |
| 137 | + |
| 138 | +## Edit SPARK_SUBMIT_OPTIONS: |
| 139 | + |
| 140 | +- Set RESOURCE_STAGING_SERVER_ADDRESS address retrieving either from K8 dashboard or running: |
| 141 | + |
| 142 | + `kubectl get svc spark-resource-staging-service -o jsonpath='{.spec.clusterIP}'` |
| 143 | + |
| 144 | +## Start Zeppelin server: |
| 145 | + |
| 146 | +`kubectl create -f zeppelin-pod-local.yaml` |
| 147 | + |
| 148 | +You can retrieve Zeppelin server address either from K8 dashboard or using kubectl. |
| 149 | +Zeppelin server should be reachable from outside of K8 cluster on K8 node address (same as in k8 master url KUBERNATES_NODE_ADDRESS) and nodePort property returned by running: |
| 150 | + |
| 151 | +`kubectl get svc --selector=app=zeppelin-server -o jsonpath='{.items[0].spec.ports}'.` |
| 152 | + |
| 153 | +## Edit spark interpreter settings |
| 154 | +Set master url to point to your Kubernetes cluster: k8s://https://x.x.x.x:8443 or use default address which works inside a Kubernetes cluster: |
| 155 | +k8s://https://kubernetes:443. |
| 156 | +Add property 'spark.submit.deployMode' and set value to 'cluster'. |
| 157 | + |
| 158 | + |
| 159 | +## Run ’Zeppelin Tutorial/Basic Features (Spark)’ notebook |
| 160 | +In case of problems you can check for spark-submit output in Zeppelin logs after logging into zeppelin-server pod and restart Spark interpreter to try again. |
| 161 | + |
| 162 | +`kubectl exec -it zeppelin-server bash` |
| 163 | +Logs files are in /opt/zeppelin/logs folder. |
0 commit comments