Skip to content

Commit 1281447

Browse files
author
Andrew Or
committed
Address a few comments
1 parent b9843f2 commit 1281447

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

docs/job-scheduling.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -74,12 +74,6 @@ Mesos already has a similar notion of dynamic resource sharing in fine-grained m
7474
dynamic allocation allows your Mesos application to take advantage of coarse-grained low-latency
7575
scheduling while sharing cluster resources efficiently.
7676

77-
Lastly, it is worth noting that Spark's dynamic resource allocation mechanism is cooperative.
78-
This means if a Spark application enables this feature, other applications on the same cluster
79-
are also expected to do so. Otherwise, the cluster's resources will end up being unfairly
80-
distributed to the applications that do not voluntarily give up unused resources they have
81-
acquired.
82-
8377
### Configuration and Setup
8478

8579
All configurations used by this feature live under the `spark.dynamicAllocation.*` namespace.
@@ -89,9 +83,11 @@ provide lower and upper bounds for the number of executors through
8983
configurations are described on the [configurations page](configuration.html#dynamic-allocation)
9084
and in the subsequent sections in detail.
9185

92-
Additionally, your application must use an external shuffle service (described below). To enable
93-
this, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is
94-
implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager`
86+
Additionally, your application must use an external shuffle service. The purpose of the service is
87+
to preserve the shuffle files written by executors so the executors can be safely removed (more
88+
detail described [below](job-scheduling.html#graceful-decommission-of-executors)). To enable
89+
this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service
90+
is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager`
9591
in your cluster. To start this service, follow these steps:
9692

9793
1. Build Spark with the [YARN profile](building-spark.html). Skip this step if you are using a
@@ -108,7 +104,7 @@ then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
108104

109105
### Resource Allocation Policy
110106

111-
On a high level, Spark should relinquish executors when they are no longer used and acquire
107+
At a high level, Spark should relinquish executors when they are no longer used and acquire
112108
executors when they are needed. Since there is no definitive way to predict whether an executor
113109
that is about to be removed will run a task in the near future, or whether a new executor that is
114110
about to be added will actually be idle, we need a set of heuristics to determine when to remove
@@ -163,6 +159,12 @@ independently of your Spark applications and their executors. If the service is
163159
executors will fetch shuffle files from the service instead of from each other. This means any
164160
shuffle state written by an executor may continue to be served beyond the executor's lifetime.
165161

162+
In addition to writing shuffle files, executors also cache data either on disk or in memory.
163+
When an executor is removed, however, all cached data will no longer be accessible. There is
164+
currently not yet a solution for this in Spark 1.2. In future releases, the cached data may be
165+
preserved through an off-heap storage similar in spirit to how shuffle files are preserved through
166+
the external shuffle service.
167+
166168
# Scheduling Within an Application
167169

168170
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if

0 commit comments

Comments
 (0)