Skip to content

Commit 51b03b7

Browse files
tnachenAndrew Or
authored andcommitted
[SPARK-12463][SPARK-12464][SPARK-12465][SPARK-10647][MESOS] Fix zookeeper dir with mesos conf and add docs.
Fix zookeeper dir configuration used in cluster mode, and also add documentation around these settings. Author: Timothy Chen <[email protected]> Closes #10057 from tnachen/fix_mesos_dir.
1 parent 711ce04 commit 51b03b7

File tree

5 files changed

+36
-25
lines changed

5 files changed

+36
-25
lines changed

core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ private[mesos] class MesosClusterDispatcher(
5050
extends Logging {
5151

5252
private val publicAddress = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(args.host)
53-
private val recoveryMode = conf.get("spark.mesos.deploy.recoveryMode", "NONE").toUpperCase()
53+
private val recoveryMode = conf.get("spark.deploy.recoveryMode", "NONE").toUpperCase()
5454
logInfo("Recovery mode in Mesos dispatcher set to: " + recoveryMode)
5555

5656
private val engineFactory = recoveryMode match {
@@ -98,8 +98,8 @@ private[mesos] object MesosClusterDispatcher extends Logging {
9898
conf.setMaster(dispatcherArgs.masterUrl)
9999
conf.setAppName(dispatcherArgs.name)
100100
dispatcherArgs.zookeeperUrl.foreach { z =>
101-
conf.set("spark.mesos.deploy.recoveryMode", "ZOOKEEPER")
102-
conf.set("spark.mesos.deploy.zookeeper.url", z)
101+
conf.set("spark.deploy.recoveryMode", "ZOOKEEPER")
102+
conf.set("spark.deploy.zookeeper.url", z)
103103
}
104104
val dispatcher = new MesosClusterDispatcher(dispatcherArgs, conf)
105105
dispatcher.start()

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterPersistenceEngine.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,9 @@ private[spark] trait MesosClusterPersistenceEngine {
5353
* all of them reuses the same connection pool.
5454
*/
5555
private[spark] class ZookeeperMesosClusterPersistenceEngineFactory(conf: SparkConf)
56-
extends MesosClusterPersistenceEngineFactory(conf) {
56+
extends MesosClusterPersistenceEngineFactory(conf) with Logging {
5757

58-
lazy val zk = SparkCuratorUtil.newClient(conf, "spark.mesos.deploy.zookeeper.url")
58+
lazy val zk = SparkCuratorUtil.newClient(conf)
5959

6060
def createEngine(path: String): MesosClusterPersistenceEngine = {
6161
new ZookeeperMesosClusterPersistenceEngine(path, zk, conf)

docs/configuration.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1585,6 +1585,29 @@ Apart from these, the following properties are also available, and may be useful
15851585
</tr>
15861586
</table>
15871587

1588+
#### Deploy
1589+
1590+
<table class="table">
1591+
<tr><th>Property Name</th><th>Default</th><th>Meaniing</th></tr>
1592+
<tr>
1593+
<td><code>spark.deploy.recoveryMode</code></td>
1594+
<td>NONE</td>
1595+
<td>The recovery mode setting to recover submitted Spark jobs with cluster mode when it failed and relaunches.
1596+
This is only applicable for cluster mode when running with Standalone or Mesos.</td>
1597+
</tr>
1598+
<tr>
1599+
<td><code>spark.deploy.zookeeper.url</code></td>
1600+
<td>None</td>
1601+
<td>When `spark.deploy.recoveryMode` is set to ZOOKEEPER, this configuration is used to set the zookeeper URL to connect to.</td>
1602+
</tr>
1603+
<tr>
1604+
<td><code>spark.deploy.zookeeper.dir</code></td>
1605+
<td>None</td>
1606+
<td>When `spark.deploy.recoveryMode` is set to ZOOKEEPER, this configuration is used to set the zookeeper directory to store recovery state.</td>
1607+
</tr>
1608+
</table>
1609+
1610+
15881611
#### Cluster Managers
15891612
Each cluster manager in Spark has additional configuration options. Configurations
15901613
can be found on the pages for each mode:

docs/running-on-mesos.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,10 @@ can find the results of the driver from the Mesos Web UI.
153153
To use cluster mode, you must start the `MesosClusterDispatcher` in your cluster via the `sbin/start-mesos-dispatcher.sh` script,
154154
passing in the Mesos master URL (e.g: mesos://host:5050). This starts the `MesosClusterDispatcher` as a daemon running on the host.
155155

156-
If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`).
156+
If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`). Note that the `MesosClusterDispatcher` not yet supports multiple instances for HA.
157+
158+
The `MesosClusterDispatcher` also supports writing recovery state into Zookeeper. This will allow the `MesosClusterDispatcher` to be able to recover all submitted and running containers on relaunch. In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env by configuring `spark.deploy.recoveryMode` and related spark.deploy.zookeeper.* configurations.
159+
For more information about these configurations please refer to the configurations (doc)[configurations.html#deploy].
157160

158161
From the client, you can submit a job to Mesos cluster by running `spark-submit` and specifying the master URL
159162
to the URL of the `MesosClusterDispatcher` (e.g: mesos://dispatcher:7077). You can view driver statuses on the

docs/spark-standalone.md

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,8 @@ You can optionally configure the cluster further by setting environment variable
112112
<tr>
113113
<td><code>SPARK_LOCAL_DIRS</code></td>
114114
<td>
115-
Directory to use for "scratch" space in Spark, including map output files and RDDs that get
116-
stored on disk. This should be on a fast, local disk in your system. It can also be a
115+
Directory to use for "scratch" space in Spark, including map output files and RDDs that get
116+
stored on disk. This should be on a fast, local disk in your system. It can also be a
117117
comma-separated list of multiple directories on different disks.
118118
</td>
119119
</tr>
@@ -341,23 +341,8 @@ Learn more about getting started with ZooKeeper [here](http://zookeeper.apache.o
341341

342342
**Configuration**
343343

344-
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:
345-
346-
<table class="table">
347-
<tr><th style="width:21%">System property</th><th>Meaning</th></tr>
348-
<tr>
349-
<td><code>spark.deploy.recoveryMode</code></td>
350-
<td>Set to ZOOKEEPER to enable standby Master recovery mode (default: NONE).</td>
351-
</tr>
352-
<tr>
353-
<td><code>spark.deploy.zookeeper.url</code></td>
354-
<td>The ZooKeeper cluster url (e.g., 192.168.1.100:2181,192.168.1.101:2181).</td>
355-
</tr>
356-
<tr>
357-
<td><code>spark.deploy.zookeeper.dir</code></td>
358-
<td>The directory in ZooKeeper to store recovery state (default: /spark).</td>
359-
</tr>
360-
</table>
344+
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env by configuring `spark.deploy.recoveryMode` and related spark.deploy.zookeeper.* configurations.
345+
For more information about these configurations please refer to the configurations (doc)[configurations.html#deploy]
361346

362347
Possible gotcha: If you have multiple Masters in your cluster but fail to correctly configure the Masters to use ZooKeeper, the Masters will fail to discover each other and think they're all leaders. This will not lead to a healthy cluster state (as all Masters will schedule independently).
363348

0 commit comments

Comments
 (0)