Merge branch 'master' of github.com:mesosphere/spark-build

Michael Gummelt · Michael Gummelt · commit 07180d040af2 · 2017-02-10T17:05:46.000-08:00
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -32,12 +32,16 @@ RUN apt-get update && \
             runit \
             nginx
 
-RUN add-apt-repository ppa:openjdk-r/ppa
 RUN apt-get update && \
-    apt-get install -y openjdk-8-jdk curl
+    apt-get install -y curl
 RUN apt-get install -y r-base
 
-ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
+RUN cd /usr/lib/jvm && \
+    curl -O https://downloads.mesosphere.com/java/jre-8u121-linux-x64.tar.gz && \
+    tar zxf jre-8u121-linux-x64.tar.gz && \
+    rm jre-8u121-linux-x64.tar.gz
+
+ENV JAVA_HOME /usr/lib/jvm/jre1.8.0_121
 ENV MESOS_NATIVE_JAVA_LIBRARY /usr/lib/libmesos.so
 ENV HADOOP_CONF_DIR /etc/hadoop
 
diff --git a/docs/history-server.md b/docs/history-server.md
@@ -22,9 +22,12 @@ your cluster][10] and run:
 configuration file. Here we call it `options.json`:
 
         {
-           "history-server": {
-             "enabled": true
-           }
+          "history-server": {
+            "enabled": true
+          },
+          "hdfs": {
+            "config-url": "http://hdfs.marathon.mesos:9000/v1/connection"
+          }
         }
 
 1.  Install Spark:
@@ -40,4 +43,4 @@ configuration file. Here we call it `options.json`:
 to the history server entry for that job.
 
  [3]: http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
- [10]: https://docs.mesosphere.com/1.8/administration/access-node/sshcluster/
+ [10]: https://docs.mesosphere.com/1.9/administration/access-node/sshcluster/
diff --git a/docs/index.md b/docs/index.md
@@ -57,8 +57,8 @@ dispatcher and the history server
  [1]: http://spark.apache.org/documentation.html
  [2]: http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode
  [3]: http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
- [4]: https://docs.mesosphere.com/1.8/usage/service-guides/hdfs/
- [5]: https://docs.mesosphere.com/1.8/usage/service-guides/kafka/
+ [4]: https://docs.mesosphere.com/1.9/usage/service-guides/hdfs/
+ [5]: https://docs.mesosphere.com/1.9/usage/service-guides/kafka/
  [6]: https://zeppelin.incubator.apache.org/
  [17]: https://github.com/mesosphere/spark
  [18]: https://github.com/mesosphere/spark-build
diff --git a/docs/install.md b/docs/install.md
@@ -6,7 +6,7 @@ enterprise: 'no'
 ---
 
 # About Installing Spark on Enterprise DC/OS
-In Enterprise DC/OS `strict` [security mode](https://docs.mesosphere.com/1.8/administration/installing/custom/configuration-parameters/#security), Spark requires a service account. In `permissive`, a service account is optional. Only someone with `superuser` permission can create the service account. Refer to [Provisioning Spark](https://docs.mesosphere.com/1.8/administration/id-and-access-mgt/service-auth/spark-auth/) for instructions.
+In Enterprise DC/OS `strict` [security mode](https://docs.mesosphere.com/1.9/administration/installing/custom/configuration-parameters/#security), Spark requires a service account. In `permissive`, a service account is optional. Only someone with `superuser` permission can create the service account. Refer to [Provisioning Spark](https://docs.mesosphere.com/1.9/administration/id-and-access-mgt/service-auth/spark-auth/) for instructions.
 
 # Default Installation
 
@@ -17,10 +17,11 @@ server.
 
     $ dcos package install spark
 
-Go to the **Services** tab of the DC/OS web interface to monitor the deployment. Once it is
+Go to the **Services** > **Deployments** tab of the DC/OS web interface to monitor the deployment. Once it is
 complete, visit Spark at `http://<dcos-url>/service/spark/`.
 
-You can also [install Spark via the DC/OS web interface](https://docs.mesosphere.com/1.8/usage/webinterface/#universe).
+You can also [install Spark via the DC/OS web interface](https://docs.mesosphere.com/1.9/usage/webinterface/#universe).
+
 **Note:** If you install Spark via the web interface, run the
 following command from the DC/OS CLI to install the Spark CLI:
 
diff --git a/docs/limitations.md b/docs/limitations.md
@@ -5,19 +5,8 @@ feature_maturity: stable
 enterprise: 'no'
 ---
 
-*   DC/OS Spark only supports submitting jars and Python scripts. It
-does not support R.
+*   Mesosphere does not provide support for Spark app development, such as writing a Python app to process data from Kafka or writing Scala code to process data from HDFS.
 
-*   Mesosphere does not provide support for Spark app development,
-such as writing a Python app to process data from Kafka or writing 
-Scala code to process data from HDFS.
+*   Spark jobs run in Docker containers. The first time you run a Spark job on a node, it might take longer than you expect because of the `docker pull`.
 
-*   Spark jobs run in Docker containers. The first time you run a
-Spark job on a node, it might take longer than you expect because of
-the `docker pull`.
-
-*   DC/OS Spark only supports running the Spark shell from within a
-DC/OS cluster. See the Spark Shell section for more information. 
-For interactive analytics, we
-recommend Zeppelin, which supports visualizations and dynamic
-dependency management.
+*   DC/OS Spark only supports running the Spark shell from within a DC/OS cluster. See the Spark Shell section for more information. For interactive analytics, we recommend Zeppelin, which supports visualizations and dynamic dependency management.
diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -11,12 +11,16 @@ enterprise: 'no'
 
 1.  Run a Spark job:
 
-        $ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://s3.amazonaws.com/downloads.mesosphere.io/spark/assets/spark-examples_2.10-1.4.0-SNAPSHOT.jar 30"
+        $ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.10-1.4.0-SNAPSHOT.jar 30"
 
 1.  Run a Python Spark job:
 
         $ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
 
+1.  Run an R Spark job:
+
+        $ dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"
+
 1.  View your job:
 
     Visit the Spark cluster dispatcher at
diff --git a/docs/run-job.md b/docs/run-job.md
@@ -12,9 +12,10 @@ more][13].
 
         $ dcos spark run --submit-args=`--class MySampleClass http://external.website/mysparkapp.jar 30`
 
-
         $ dcos spark run --submit-args="--py-files mydependency.py http://external.website/mysparkapp.py 30"
 
+        $ dcos spark run --submit-args="http://external.website/mysparkapp.R"
+
     `dcos spark run` is a thin wrapper around the standard Spark
     `spark-submit` script. You can submit arbitrary pass-through options
     to this script via the `--submit-args` options.
@@ -64,7 +65,7 @@ To set Spark properties with a configuration file, create a
 
 # Versioning
 
-The DC/OS Spark docker image contains OpenJDK 8 and Python 2.7.6.
+The DC/OS Spark Docker image contains OpenJDK 8 and Python 2.7.6.
 
 DC/OS Spark distributions 1.X are compiled with Scala 2.10.  DC/OS
 Spark distributions 2.X are compiled with Scala 2.11.  Scala is not
diff --git a/docs/runtime-config-change.md b/docs/runtime-config-change.md
@@ -7,15 +7,13 @@ enterprise: 'no'
 
 You can customize DC/OS Spark in-place when it is up and running.
 
-1.  Go to the DC/OS web interface.
+1.  Go to the DC/OS GUI.
 
 1.  Click the **Services** tab, then the name of the Spark
 framework to be updated.
 
-1.  Within the Spark instance details view, click **Edit**.
+1.  Within the Spark instance details view, click the menu in the upper right, then choose **Edit**.
 
-1.  In the dialog that appears, click the **Environment Variables**
-tab and update any field(s) to their desired value(s).
+1.  In the dialog that appears, click the **Environment** tab and update any field(s) to their desired value(s).
 
-1.  Click **Deploy** to apply any changes and
-cleanly reload Spark.
+1.  Click **REVIEW & RUN** to apply any changes and cleanly reload Spark.
diff --git a/docs/security.md b/docs/security.md
@@ -3,7 +3,6 @@ post_title: Security
 menu_order: 40
 enterprise: 'no'
 ---
-
 # Mesos Security
 
 ## SSL
@@ -23,13 +22,11 @@ enterprise: 'no'
 
 ## Authentication
 
-When running in [DC/OS strict security mode](https://docs.mesosphere.com/latest/administration/id-and-access-mgt/), both the dispatcher and jobs must authenticate to Mesos using a [DC/OS Service Account](https://docs.mesosphere.com/1.8/administration/id-and-access-mgt/service-auth/).
+When running in [DC/OS strict security mode](https://docs.mesosphere.com/latest/administration/id-and-access-mgt/), both the dispatcher and jobs must authenticate to Mesos using a [DC/OS Service Account](https://docs.mesosphere.com/1.9/administration/id-and-access-mgt/service-auth/).
 
 Follow these instructions to authenticate in strict mode:
 
-1. Create a Service Account
-
-    Instructions [here](https://docs.mesosphere.com/1.8/administration/id-and-access-mgt/service-auth/universe-service-auth/).
+1. Create a service account by following the instructions [here](https://docs.mesosphere.com/1.9/administration/id-and-access-mgt/service-auth/universe-service-auth/).
 
 1. Assign Permissions
 
@@ -47,7 +44,7 @@ Follow these instructions to authenticate in strict mode:
          "$(dcos config show core.dcos_url)/acs/api/v1/acls/dcos:mesos:master:task:user:root/users/${SERVICE_ACCOUNT_NAME}/create"
     ```
 
-    Now you must allow Spark to register under the desired role.  This is the value used for `service.role` when installing Spark (default: `*`):
+    Now, you must allow Spark to register under the desired role. This is the value used for `service.role` when installing Spark (default: `*`):
     
     ```
     $ export ROLE=<service.role value>
@@ -88,7 +85,7 @@ Follow these instructions to authenticate in strict mode:
 
 1. Submit a Job
 
-    We've now installed the Spark Dispatcher, which is authenticating itself to the Mesos master.  Spark jobs are also frameworks which must authenticate. The dispatcher will pass the secret along to the jobs, so all that's left to do is configure our jobs to use DC/OS authentication:
+    We've now installed the Spark Dispatcher, which is authenticating itself to the Mesos master. Spark jobs are also frameworks that must authenticate. The dispatcher will pass the secret along to the jobs, so all that's left to do is configure our jobs to use DC/OS authentication:
     
     ```
     $ PROPS="-Dspark.mesos.driverEnv.MESOS_MODULES=file:///opt/mesosphere/etc/mesos-scheduler-modules/dcos_authenticatee_module.json "
@@ -172,5 +169,5 @@ In addition to the described configuration, make sure to connect the DC/OS clust
 
     $ dcos config set core.dcos_url https://<dcos-url>
 
- [11]: https://docs.mesosphere.com/1.8/overview/components/
+ [11]: https://docs.mesosphere.com/1.9/overview/architecture/components/
  [12]: http://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html
diff --git a/docs/spark-shell.md b/docs/spark-shell.md
@@ -7,7 +7,7 @@ enterprise: 'no'
 # Interactive Spark Shell
 
 You can run Spark commands interactively in the Spark shell. The Spark shell is available
-in either Scala or Python.
+in either Scala, Python, or R.
 
 1. SSH into a node in the DC/OS cluster. [Learn how to SSH into your cluster and get the agent node ID](https://dcos.io/docs/latest/administration/access-node/sshcluster/).
 
@@ -27,6 +27,10 @@ in either Scala or Python.
 
         $ ./bin/pyspark --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.4-2.0.1 --conf spark.mesos.executor.home=/opt/spark/dist
 
+    Or, run the R Spark shell.
+
+        $ ./bin/sparkR --master mesos://<internal-master-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.6 --conf spark.mesos.executor.home=/opt/spark/dist
+
 1. Run Spark commands interactively.
 
     In the Scala shell:
@@ -38,3 +42,8 @@ in either Scala or Python.
 
         $ textFile = sc.textFile("/opt/spark/dist/README.md")
         $ textFile.count()
+
+    In the R shell:
+
+        $ df <- as.DataFrame(faithful)
+        $ head(df)
diff --git a/docs/uninstall.md b/docs/uninstall.md
@@ -7,7 +7,7 @@ enterprise: 'no'
 
     $ dcos package uninstall --app-id=<app-id> spark
 
-The Spark dispatcher persists state in Zookeeper, so to fully
+The Spark dispatcher persists state in ZooKeeper, so to fully
 uninstall the Spark DC/OS package, you must go to
 `http://<dcos-url>/exhibitor`, click on `Explorer`, and delete the
 znode corresponding to your instance of Spark. By default this is
diff --git a/docs/upgrade.md b/docs/upgrade.md
@@ -5,9 +5,11 @@ feature_maturity: stable
 enterprise: 'no'
 ---
 
-1.  In the **Services** section of the DC/OS web UI, destroy the Spark instance to be
-updated.
-1.  Verify that you no longer see it in the DC/OS web UI.
+1.  Go to the **Universe** > **Installed** page of the DC/OS GUI. Hover over your Spark Service to see the **Uninstall** button, then select it. Alternatively, enter the following from the DC/OS CLI:
+
+        $ dcos package uninstall spark
+
+1.  Verify that you no longer see your Spark service on the **Services** page.
 1.  Reinstall Spark.
 
         $ dcos package install spark