Skip to content

Commit 73d3b23

Browse files
foxishMarcelo Vanzin
authored andcommitted
[SPARK-23104][K8S][DOCS] Changes to Kubernetes scheduler documentation
## What changes were proposed in this pull request? Docs changes: - Adding a warning that the backend is experimental. - Removing a defunct internal-only option from documentation - Clarifying that node selectors can be used right away, and other minor cosmetic changes ## How was this patch tested? Docs only change Author: foxish <[email protected]> Closes #20314 from foxish/ambiguous-docs.
1 parent d8aaa77 commit 73d3b23

File tree

2 files changed

+22
-25
lines changed

2 files changed

+22
-25
lines changed

docs/cluster-overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ The system currently supports three cluster managers:
5252
* [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can also run Hadoop MapReduce
5353
and service applications.
5454
* [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2.
55-
* [Kubernetes](running-on-kubernetes.html) -- [Kubernetes](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)
56-
is an open-source platform that provides container-centric infrastructure.
55+
* [Kubernetes](running-on-kubernetes.html) -- an open-source system for automating deployment, scaling,
56+
and management of containerized applications.
5757

5858
A third-party project (not supported by the Spark project) exists to add support for
5959
[Nomad](https://github.com/hashicorp/nomad-spark) as a cluster manager.

docs/running-on-kubernetes.md

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ title: Running Spark on Kubernetes
88
Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native
99
Kubernetes scheduler that has been added to Spark.
1010

11+
**The Kubernetes scheduler is currently experimental.
12+
In future versions, there may be behavioral changes around configuration,
13+
container images and entrypoints.**
14+
1115
# Prerequisites
1216

1317
* A runnable distribution of Spark 2.3 or above.
@@ -41,11 +45,10 @@ logs and remains in "completed" state in the Kubernetes API until it's eventuall
4145

4246
Note that in the completed state, the driver pod does *not* use any computational or memory resources.
4347

44-
The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling
45-
decisions for driver and executor pods using advanced primitives like
46-
[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
47-
and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
48-
in a future release.
48+
The driver and executor pod scheduling is handled by Kubernetes. It is possible to schedule the
49+
driver and executor pods on a subset of available nodes through a [node selector](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
50+
using the configuration property for it. It will be possible to use more advanced
51+
scheduling hints like [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) in a future release.
4952

5053
# Submitting Applications to Kubernetes
5154

@@ -62,8 +65,10 @@ use with the Kubernetes backend.
6265

6366
Example usage is:
6467

65-
./bin/docker-image-tool.sh -r <repo> -t my-tag build
66-
./bin/docker-image-tool.sh -r <repo> -t my-tag push
68+
```bash
69+
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
70+
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
71+
```
6772

6873
## Cluster Mode
6974

@@ -94,7 +99,7 @@ must consist of lower case alphanumeric characters, `-`, and `.` and must start
9499
If you have a Kubernetes cluster setup, one way to discover the apiserver URL is by executing `kubectl cluster-info`.
95100

96101
```bash
97-
kubectl cluster-info
102+
$ kubectl cluster-info
98103
Kubernetes master is running at http://127.0.0.1:6443
99104
```
100105

@@ -105,7 +110,7 @@ authenticating proxy, `kubectl proxy` to communicate to the Kubernetes API.
105110
The local proxy can be started by:
106111

107112
```bash
108-
kubectl proxy
113+
$ kubectl proxy
109114
```
110115

111116
If the local proxy is running at localhost:8001, `--master k8s://http://127.0.0.1:8001` can be used as the argument to
@@ -173,7 +178,7 @@ Logs can be accessed using the Kubernetes API and the `kubectl` CLI. When a Spar
173178
to stream logs from the application using:
174179

175180
```bash
176-
kubectl -n=<namespace> logs -f <driver-pod-name>
181+
$ kubectl -n=<namespace> logs -f <driver-pod-name>
177182
```
178183

179184
The same logs can also be accessed through the
@@ -186,7 +191,7 @@ The UI associated with any application can be accessed locally using
186191
[`kubectl port-forward`](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/#forward-a-local-port-to-a-port-on-the-pod).
187192

188193
```bash
189-
kubectl port-forward <driver-pod-name> 4040:4040
194+
$ kubectl port-forward <driver-pod-name> 4040:4040
190195
```
191196

192197
Then, the Spark driver UI can be accessed on `http://localhost:4040`.
@@ -200,13 +205,13 @@ are errors during the running of the application, often, the best way to investi
200205
To get some basic information about the scheduling decisions made around the driver pod, you can run:
201206

202207
```bash
203-
kubectl describe pod <spark-driver-pod>
208+
$ kubectl describe pod <spark-driver-pod>
204209
```
205210

206211
If the pod has encountered a runtime error, the status can be probed further using:
207212

208213
```bash
209-
kubectl logs <spark-driver-pod>
214+
$ kubectl logs <spark-driver-pod>
210215
```
211216

212217
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
@@ -254,7 +259,7 @@ To create a custom service account, a user can use the `kubectl create serviceac
254259
following command creates a service account named `spark`:
255260

256261
```bash
257-
kubectl create serviceaccount spark
262+
$ kubectl create serviceaccount spark
258263
```
259264

260265
To grant a service account a `Role` or `ClusterRole`, a `RoleBinding` or `ClusterRoleBinding` is needed. To create
@@ -263,7 +268,7 @@ for `ClusterRoleBinding`) command. For example, the following command creates an
263268
namespace and grants it to the `spark` service account created above:
264269

265270
```bash
266-
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
271+
$ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
267272
```
268273

269274
Note that a `Role` can only be used to grant access to resources (like pods) within a single namespace, whereas a
@@ -543,14 +548,6 @@ specific to Spark on Kubernetes.
543548
to avoid name conflicts.
544549
</td>
545550
</tr>
546-
<tr>
547-
<td><code>spark.kubernetes.executor.podNamePrefix</code></td>
548-
<td>(none)</td>
549-
<td>
550-
Prefix for naming the executor pods.
551-
If not set, the executor pod name is set to driver pod name suffixed by an integer.
552-
</td>
553-
</tr>
554551
<tr>
555552
<td><code>spark.kubernetes.executor.lostCheck.maxAttempts</code></td>
556553
<td><code>10</code></td>

0 commit comments

Comments
 (0)