Skip to content

Commit d54418e

Browse files
authored
Merge pull request apache#78 from mesosphere/add-pyspark-documentation
Documented Python support and Spark shell.
2 parents 6433570 + 5e09c7c commit d54418e

File tree

1 file changed

+48
-3
lines changed

1 file changed

+48
-3
lines changed

docs/user-docs.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ DC/OS Spark includes:
1818
* [Mesos Cluster Dispatcher][2]
1919
* [Spark History Server][3]
2020
* DC/OS Spark CLI
21+
* Interactive Spark shell
2122

2223
## Benefits
2324

@@ -59,6 +60,10 @@ dispatcher and the history server
5960

6061
$ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.4.0-SNAPSHOT.jar 30"
6162

63+
1. Run a Python Spark job:
64+
65+
$ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
66+
6267
1. View your job:
6368

6469
Visit the Spark cluster dispatcher at
@@ -508,6 +513,10 @@ more][13].
508513

509514
$ dcos spark run --submit-args=`--class MySampleClass http://external.website/mysparkapp.jar 30`
510515

516+
Or, for a Python job
517+
518+
$ dcos spark run --submit-args="http://external.website/mysparkapp.py 30"
519+
511520
`dcos spark run` is a thin wrapper around the standard Spark
512521
`spark-submit` script. You can submit arbitrary pass-through options
513522
to this script via the `--submit-args` options.
@@ -555,6 +564,42 @@ To set Spark properties with a configuration file, create a
555564
`spark-defaults.conf` file and set the environment variable
556565
`SPARK_CONF_DIR` to the containing directory. [Learn more][15].
557566

567+
<a name="pysparkshell"></a>
568+
# Interactive Spark Shell
569+
570+
You can run Spark commands interactively in the Spark shell. The Spark shell is available
571+
in either Scala or Python.
572+
573+
1. SSH into a node in the DC/OS cluster. [Learn how to SSH into your cluster and get the agent node ID](https://dcos.io/docs/latest/administration/access-node/sshcluster/).
574+
575+
$ dcos node ssh --master-proxy --mesos-id=<agent-node-id>
576+
577+
1. Run a Spark Docker image.
578+
579+
$ docker pull mesosphere/spark:1.0.4-2.0.1
580+
581+
$ docker run -it --net=host mesosphere/spark:1.0.4-2.0.1 /bin/bash
582+
583+
1. Run the Scala Spark shell from within the Docker image.
584+
585+
$ ./bin/spark-shell --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist
586+
587+
Or, run the Python Spark shell.
588+
589+
$ ./bin/pyspark --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist
590+
591+
1. Run Spark commands interactively.
592+
593+
In the Scala shell:
594+
595+
$ val textFile = sc.textFile("/opt/spark/dist/README.md")
596+
$ textFile.count()
597+
598+
In the Python shell:
599+
600+
$ textFile = sc.textFile("/opt/spark/dist/README.md")
601+
$ textFile.count()
602+
558603
<a name="uninstall"></a>
559604
# Uninstall
560605

@@ -628,14 +673,14 @@ output:
628673
<a name="limitations"></a>
629674
# Limitations
630675

631-
* DC/OS Spark only supports submitting jars. It does not support
632-
Python or R.
676+
* DC/OS Spark only supports submitting jars and Python scripts. It
677+
does not support R.
633678

634679
* Spark jobs run in Docker containers. The first time you run a
635680
Spark job on a node, it might take longer than you expect because of
636681
the `docker pull`.
637682

638-
* Spark shell is not supported. For interactive analytics, we
683+
* For interactive analytics, we
639684
recommend Zeppelin, which supports visualizations and dynamic
640685
dependency management.
641686

0 commit comments

Comments
 (0)