[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check #16720

felixcheung · 2017-01-27T19:12:11Z

What changes were proposed in this pull request?

this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when master = "" (default), but also related to SPARK-18449 since the real master value is not known at the time the R code in sparkR.session is run. (master cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
fix is to add check to the beginning of each test and vignettes; the same would also work by changing sparkR.session() to sparkR.session(master = "local") in tests, but I think being more explicit is better.

How was this patch tested?

Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.

manually as:

# modify DESCRIPTION to revert version to 2.1.0
SPARK_HOME=/usr/spark R CMD build pkg
# run cran check without SPARK_HOME 
R CMD check --as-cran SparkR_2.1.0.tar.gz

felixcheung · 2017-01-27T19:12:26Z

@shivaram

shivaram · 2017-01-27T19:16:16Z

Hmm - another fix could be that in the test cases whenever we create spark.session we always pass in master=local ?

felixcheung · 2017-01-27T19:38:07Z

yes, that is described in the PR description:
"
fix is to add check to the beginning of each test and vignettes; the same would also work by changing sparkR.session() to sparkR.session(master = "local") in tests, but I think being more explicit is better.
"

And beside, it is conceivable that these tests could be set to run in a cluster, in which case we could expect SPARK_HOME is set but master is not local. (with this PR that would still work properly)

felixcheung · 2017-01-27T19:39:36Z

I think better would be the approach being taken in PR 16330 - has a first run test that prepare and run these kind of global thing https://github.com/apache/spark/pull/16330/files#diff-5ff1ba5d1751f3b1cc96a567e9ab25ff

SparkQA · 2017-01-27T19:56:21Z

Test build #72082 has finished for PR 16720 at commit 318ecc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2017-01-28T02:03:50Z

I am not sure tests are ever meant to run on a cluster (see the number of uses of LocalSparkContext in core/src/test/scala) -- The main reason I dont want to introduce the 'first test' approach is that we are then relying too much on test names not clashing / getting in front of each other which seems fragile.

The other thing that might be good is to create a test util function like initializeTestSparkContext and inside that we put both the session start and install stuff.

felixcheung · 2017-01-28T07:03:09Z

Sure, I've simplified it.

Good point on the ordering - digging into it looks like it's just file system search order, which really is not reliable.

We could certainly add a test util - though seems like some tests are different though, for example test_context.R doesn't need a SparkSession.

SparkQA · 2017-01-28T07:33:07Z

Test build #72103 has finished for PR 16720 at commit f51f504.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2017-02-01T06:16:16Z

R/pkg/vignettes/sparkr-vignettes.Rmd

 We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession).

+```{r, include=FALSE}
+SparkR:::sparkCheckInstall()


Is it ok to include a ::: function in the vignette ?

this has include=FALSE so it will run but the code and output will not be included in the vignettes text

Is the Rmd file a part of the install that the users see ? I just dont want to put in any code that people might copy-paste etc. Is it not good enough to pass in master=local[*] here ?

FWIW These vignette changes are still needed even if we update run-all.R

shivaram · 2017-02-01T06:18:52Z

R/pkg/inst/tests/testthat/test_utils.R

 context("functions in utils.R")

+# Ensure Spark is installed
+sparkCheckInstall()


What I had in mind was to combine the sparkR.session and this sparkCheckInstall into one function so its easy to remember for a new test file. Any thoughts on this ?

I understand that, but as pointed out #16720 (comment), some tests don't need SparkSession, and some tests will create/stop one as needed, and to have a function that does that all just mean more complexity?

Sure - that sounds fine. I was looking to see if testthat had any support for writing a setup that gets called before each test - Doesn't look like it has that

hmm, we could put it in https://github.com/apache/spark/blob/master/R/pkg/tests/run-all.R?

Ah thats a great idea - Can you see if that works (unfortunately it needs manual verification) ?

Any luck testing this out ?

sorry, I"m really swamped, haven't had the chance to test that out yet

I just tested this by putting SparkR:::checkInstall in run-all.R (before calling test_package) and that seems to do the trick on a custom 2.1.0 build !

@felixchueng when you get a chance can you update the PR with that ? The only thing that I'm concerned about is calling a private function from run-all.R - We could either export this function or move some of this functionality into install.spark

felixcheung · 2017-02-08T04:05:45Z

Great! Sure will do. I agree, I think we should consider integrating this into install.spark although that would be a change in behavior (for the better?)

felixcheung · 2017-02-13T00:12:14Z

found another issue, opened SPARK-19568

SparkQA · 2017-02-13T00:49:32Z

Test build #72793 has finished for PR 16720 at commit cd1394a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2017-02-14T21:50:30Z

LGTM. I patched this again on top of 2.1.0 and R CMD check --as-cran passes now. Merging this to master and branch-2.1

…CRAN check ## What changes were proposed in this pull request? - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config) - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better. ## How was this patch tested? Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar. manually as: ``` # modify DESCRIPTION to revert version to 2.1.0 SPARK_HOME=/usr/spark R CMD build pkg # run cran check without SPARK_HOME R CMD check --as-cran SparkR_2.1.0.tar.gz ``` Author: Felix Cheung <[email protected]> Closes #16720 from felixcheung/rcranchecktest. (cherry picked from commit a3626ca) Signed-off-by: Shivaram Venkataraman <[email protected]>

…CRAN check ## What changes were proposed in this pull request? - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config) - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better. ## How was this patch tested? Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar. manually as: ``` # modify DESCRIPTION to revert version to 2.1.0 SPARK_HOME=/usr/spark R CMD build pkg # run cran check without SPARK_HOME R CMD check --as-cran SparkR_2.1.0.tar.gz ``` Author: Felix Cheung <[email protected]> Closes apache#16720 from felixcheung/rcranchecktest.

make sure Spark is installed

318ecc8

simplify

f51f504

shivaram reviewed Feb 1, 2017

View reviewed changes

change install.spark

cd1394a

felixcheung closed this Feb 14, 2017

felixcheung reopened this Feb 14, 2017

asfgit closed this in a3626ca Feb 14, 2017

[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check #16720

[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check #16720

Uh oh!

Conversation

felixcheung commented Jan 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

felixcheung commented Jan 27, 2017

Uh oh!

shivaram commented Jan 27, 2017

Uh oh!

felixcheung commented Jan 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung commented Jan 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 27, 2017

Uh oh!

shivaram commented Jan 28, 2017

Uh oh!

felixcheung commented Jan 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Feb 8, 2017 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung commented Feb 13, 2017

Uh oh!

SparkQA commented Feb 13, 2017

Uh oh!

shivaram commented Feb 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felixcheung commented Jan 27, 2017 •

edited

Loading

felixcheung commented Jan 27, 2017 •

edited

Loading

felixcheung commented Jan 27, 2017 •

edited

Loading

felixcheung commented Jan 28, 2017 •

edited

Loading

felixcheung Feb 1, 2017 •

edited

Loading

felixcheung commented Feb 8, 2017 via email •

edited

Loading