You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-638] Update install docs for strict mode (apache#260)
* update install docs for strict mode
* fix table formatting?
* small edits
* update troubleshooting for bootstrap workaround
* Update install.md
minor typo
* Update troubleshooting.md
Added extra information on how to remove NO_BOOTSTRAP
* small change to command
* added docs for quota and strict
* small cleanup
* added option to enable bootstrap for IP detection in the dispatcher
* fix logic for using bootstrap
**Note* You can verify the secrets were created with:
217
+
218
+
```bash
219
+
$ dcos security secrets list /
220
+
```
221
+
222
+
## Assigning permissions
223
+
Permissions must be created so that the Spark service will be able to start Spark jobs and so the jobs themselves can
224
+
launch the executors that perform the work on their behalf. There are a few points to keep in mind depending on your
225
+
cluster:
226
+
227
+
* RHEL/CentOS users cannot currently run Spark in strict mode as user `nobody`, but must run as user `root`. This is
228
+
due to how accounts are mapped to UIDs. CoreOS users are unaffected, and can run as user `nobody`. We designate the
229
+
user as `spark-user` below.
230
+
231
+
* Spark runs by default under the Mesos default role, which is represented by the `*` symbol. You can deploy multiple
232
+
instances of Spark without modifying this default. If you want to override the default Spark role, you must modify
233
+
these code samples accordingly. We use `spark-service-role` to designate the role used below.
234
+
235
+
Permissions can also be assigned through the UI.
236
+
237
+
1. Run the following to create the required permissions for Spark:
238
+
```bash
239
+
$ dcos security org users grant <service-account> dcos:mesos:master:task:user:<user> create --description "Allows the Linux user to execute tasks"
240
+
$ dcos security org users grant <service-account> dcos:mesos:master:framework:role:<spark-service-role> create --description "Allows a framework to register with the Mesos master using the Mesos default role"
241
+
$ dcos security org users grant <service-account> dcos:mesos:master:task:app_id:/<service_name> create --description "Allows reading of the task state"
242
+
```
243
+
244
+
Note that above the `dcos:mesos:master:task:app_id:/<service_name>` will likely be `dcos:mesos:master:task:app_id:/spark`
245
+
246
+
For example, continuing from above:
247
+
248
+
```bash
249
+
dcos security org users grant spark-principal dcos:mesos:master:task:user:root create --description "Allows the Linux user to execute tasks"
250
+
dcos security org users grant spark-principal dcos:mesos:master:framework:role:* create --description "Allows a framework to register with the Mesos master using the Mesos default role"
251
+
dcos security org users grant spark-principal dcos:mesos:master:task:app_id:/spark create --description "Allows reading of the task state"
252
+
253
+
```
254
+
255
+
Note that here we're using the service account `spark-principal` and the user `root`.
256
+
257
+
1. If you are running the Spark service as `root` (as we are in this example) you will need to add an additional
258
+
permission for Marathon:
259
+
260
+
```bash
261
+
dcos security org users grant dcos_marathon dcos:mesos:master:task:user:root create --description "Allow Marathon to launch containers as root"
262
+
```
263
+
264
+
## Install Spark with necessary configuration
265
+
266
+
1. Make a configuration file with the following before installing Spark, these settings can also be set through the UI:
If you want to use the [Docker Engine](/1.10/deploying-services/containerizers/docker-containerizer/) instead of the [Universal Container Runtime](/1.10/deploying-services/containerizers/ucr/), you must specify the user through the `SPARK_USER` environment variable:
Copy file name to clipboardExpand all lines: docs/troubleshooting.md
+45-5Lines changed: 45 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,19 +9,59 @@ menuWeight: 125
9
9
10
10
# Dispatcher
11
11
12
-
The Mesos cluster dispatcher is responsible for queuing, tracking, and supervising drivers. Potential problems may arise if the dispatcher does not receive the resources offers you expect from Mesos, or if driver submission is failing. To debug this class of issue, visit the Mesos UI at `http://<dcos-url>/mesos/` and navigate to the sandbox for the dispatcher.
12
+
* The Mesos cluster dispatcher is responsible for queuing, tracking, and supervising drivers. Potential problems may
13
+
arise if the dispatcher does not receive the resources offers you expect from Mesos, or if driver submission is
14
+
failing. To debug this class of issue, visit the Mesos UI at `http://<dcos-url>/mesos/` and navigate to the sandbox
15
+
for the dispatcher.
16
+
17
+
* Spark has an internal mechanism for detecting the IP of the host. We use this method by default, but sometimes it
18
+
fails, returning errors like these:
19
+
20
+
```
21
+
ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
22
+
java.net.UnknownHostException: ip-172-31-4-148: ip-172-31-4-148: Name or service not known
23
+
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
24
+
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:891)
25
+
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:884)
26
+
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:884)
27
+
at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941)
28
+
at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941)
29
+
at scala.Option.getOrElse(Option.scala:121)
30
+
at org.apache.spark.util.Utils$.localHostName(Utils.scala:941)
31
+
at org.apache.spark.deploy.mesos.MesosClusterDispatcherArguments.<init>(MesosClusterDispatcherArguments.scala:27)
32
+
at org.apache.spark.deploy.mesos.MesosClusterDispatcher$.main(MesosClusterDispatcher.scala:103)
33
+
at org.apache.spark.deploy.mesos.MesosClusterDispatcher.main(MesosClusterDispatcher.scala)
34
+
Caused by: java.net.UnknownHostException: ip-172-31-4-148: Name or service not known
35
+
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
36
+
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
37
+
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
38
+
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
39
+
... 10 more
40
+
18/01/25 17:42:57 INFO ShutdownHookManager: Shutdown hook called
41
+
```
42
+
43
+
In this case, enable the `service.use_bootstrap_for_IP_detect` option in the Dispatcher config, either via the UI,
44
+
editing the task or set to `true` in the options.json, and restart the service. This will cause the DC/OS-specific
45
+
`bootstrap` utility to detect the IP, which may allow the initialization of the Spark service to complete.
13
46
14
47
# Jobs
15
48
16
-
* DC/OS Apache Spark jobs are submitted through the dispatcher, which displays Spark properties and job state. Start here to verify that the job is configured as you expect.
49
+
* DC/OS Apache Spark jobs are submitted through the dispatcher, which displays Spark properties and job state. Start
50
+
here to verify that the job is configured as you expect.
51
+
52
+
* The dispatcher further provides a link to the job's entry in the history server, which displays the Spark Job UI.
53
+
This UI shows the for the job. Go here to debug issues with scheduling and performance.
17
54
18
-
* The dispatcher further provides a link to the job's entry in the history server, which displays the Spark Job UI. This UI shows the for the job. Go here to debug issues with scheduling and performance.
55
+
* Jobs themselves log output to their sandbox, which you can access through the Mesos UI. The Spark logs will be sent
56
+
to `stderr`, while any output you write in your job will be sent to `stdout`.
19
57
20
-
* Jobs themselves log output to their sandbox, which you can access through the Mesos UI. The Spark logs will be sent to `stderr`, while any output you write in your job will be sent to `stdout`.
58
+
* To disable using the Mesosphere `bootstrap` utility for host IP detection in jobs add
59
+
`spark.mesos.driverEnv.SKIP_BOOTSTRAP_IP_DETECT=true` to your job configuration.
21
60
22
61
# CLI
23
62
24
-
The Spark CLI is integrated with the dispatcher so that they always use the same version of Spark, and so that certain defaults are honored. To debug issues with their communication, run your jobs with the `--verbose` flag.
63
+
The Spark CLI is integrated with the dispatcher so that they always use the same version of Spark, and so that certain
64
+
defaults are honored. To debug issues with their communication, run your jobs with the `--verbose` flag.
0 commit comments