From 8797ba3ab604a74c69bfebcd3037d98a4a38dd01 Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Fri, 20 May 2016 15:50:13 +0900 Subject: [PATCH 1/5] Update WINDOWS.md so that users can run unit tests on Windows --- R/WINDOWS.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/R/WINDOWS.md b/R/WINDOWS.md index 3f889c0ca3d1..4c7c8554d200 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -11,3 +11,15 @@ include Rtools and R in `PATH`. directory in Maven in `PATH`. 4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html). 5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package` + +## Unit tests + +To run existing unit tests in SparkR on Windows, the following setps are required (the steps below suppose you are in Spark root directory) + +1. Set `HADOOP_HOME`. +2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. (It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)). +3. You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): + + R -e "install.packages('testthat', repos='http://cran.us.r-project.org')" + .\bin\spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" R\pkg\tests\run-all.R + From ad68456b8288bc7b1f97b54d36781cab2a53019c Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Fri, 20 May 2016 15:56:15 +0900 Subject: [PATCH 2/5] Correct indentation --- R/WINDOWS.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/R/WINDOWS.md b/R/WINDOWS.md index 4c7c8554d200..317ec626d948 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -17,9 +17,13 @@ directory in Maven in `PATH`. To run existing unit tests in SparkR on Windows, the following setps are required (the steps below suppose you are in Spark root directory) 1. Set `HADOOP_HOME`. -2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. (It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)). +2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. + + It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils). + 3. You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): + ``` R -e "install.packages('testthat', repos='http://cran.us.r-project.org')" .\bin\spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" R\pkg\tests\run-all.R - + ``` From acb93639d2183c8d52281db6d21ec9107e282461 Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Fri, 20 May 2016 15:57:51 +0900 Subject: [PATCH 3/5] Add a missing parentheses --- R/WINDOWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/WINDOWS.md b/R/WINDOWS.md index 317ec626d948..a7343653dcc8 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -19,7 +19,7 @@ To run existing unit tests in SparkR on Windows, the following setps are require 1. Set `HADOOP_HOME`. 2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. - It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils). + It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)). 3. You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): From b62b631e98570b16d4d000e23b2e5928662ff941 Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Fri, 20 May 2016 16:02:39 +0900 Subject: [PATCH 4/5] Fix sentense --- R/WINDOWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/WINDOWS.md b/R/WINDOWS.md index a7343653dcc8..a27e2cc92c79 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -21,7 +21,7 @@ To run existing unit tests in SparkR on Windows, the following setps are require It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)). -3. You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): +3. Run unit-tests for SparkR by running below (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): ``` R -e "install.packages('testthat', repos='http://cran.us.r-project.org')" From aa2839ca7bc8b18dafa849a8bdc8d5119d1e46a1 Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Sat, 21 May 2016 10:21:09 +0900 Subject: [PATCH 5/5] Address comments --- R/README.md | 8 +++++++- R/WINDOWS.md | 16 ++++++++++------ 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/R/README.md b/R/README.md index 810bfc14e977..044f95312ae8 100644 --- a/R/README.md +++ b/R/README.md @@ -1,11 +1,13 @@ # R on Spark SparkR is an R package that provides a light-weight frontend to use Spark from R. + ### Installing sparkR Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`. By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script. Example: + ``` # where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript export R_HOME=/home/username/R @@ -17,6 +19,7 @@ export R_HOME=/home/username/R #### Build Spark Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run + ``` build/mvn -DskipTests -Psparkr package ``` @@ -38,6 +41,7 @@ To set other options like driver memory, executor memory etc. you can pass in th #### Using SparkR from RStudio If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example + ``` # Set this to where Spark is installed Sys.setenv(SPARK_HOME="/Users/username/spark") @@ -64,13 +68,15 @@ To run one of them, use `./bin/spark-submit `. For example: ./bin/spark-submit examples/src/main/r/dataframe.R -You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): +You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first: R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")' ./R/run-tests.sh ### Running on YARN + The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run + ``` export YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --master yarn examples/src/main/r/dataframe.R diff --git a/R/WINDOWS.md b/R/WINDOWS.md index a27e2cc92c79..f948ed397479 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -14,16 +14,20 @@ directory in Maven in `PATH`. ## Unit tests -To run existing unit tests in SparkR on Windows, the following setps are required (the steps below suppose you are in Spark root directory) +To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already: -1. Set `HADOOP_HOME`. -2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. +1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`. - It seems not requiring installing Hadoop but only this `winutils.exe`. It seems not included in Hadoop official binary releases so it should be built from source but it seems it is able to be downloaded from community (e.g. [steveloughran/winutils](https://github.com/steveloughran/winutils)). +2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems). + +3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present. + +4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory. + +5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first: -3. Run unit-tests for SparkR by running below (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): - ``` R -e "install.packages('testthat', repos='http://cran.us.r-project.org')" .\bin\spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" R\pkg\tests\run-all.R ``` +