[SPARK-20437][R] R wrappers for rollup and cube by zero323 · Pull Request #17728 · apache/spark

zero323 · 2017-04-22T14:31:39Z

What changes were proposed in this pull request?

Add rollup and cube methods and corresponding generics.
Add short description to the vignette.

How was this patch tested?

Existing unit tests.
Additional unit tests covering new features.
check-cran.sh.

SparkQA · 2017-04-22T14:35:48Z

Test build #76062 has finished for PR 17728 at commit c1459d0.

This patch fails R style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-22T15:15:32Z

Test build #76063 has finished for PR 17728 at commit e763aa6.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2017-04-22T15:23:41Z

Jenkins retest this please.

SparkQA · 2017-04-22T15:25:59Z

Test build #76064 has finished for PR 17728 at commit ae7512a.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-22T16:00:48Z

Test build #76065 has finished for PR 17728 at commit ae7512a.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-22T16:12:04Z

Test build #76066 has finished for PR 17728 at commit 3104eb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2017-04-22T16:15:14Z

Jenkins retest this please.

SparkQA · 2017-04-22T16:51:19Z

Test build #76067 has finished for PR 17728 at commit 3104eb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-22T20:00:51Z

Test build #76070 has finished for PR 17728 at commit 132099c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2017-04-22T23:05:03Z

cc @felixcheung

felixcheung

cool

felixcheung · 2017-04-23T00:22:12Z

+            jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)
+            sgd <- callJMethod(x@sdf, "rollup", jcol)
+            groupedData(sgd)
+          })


please add extra newline at end of file

felixcheung · 2017-04-23T00:22:47Z

+#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.


names(s) -> name(s)

perhaps variable(s) is misleading and just character name(s) or Column(s) to group on. is sufficient?

Sounds good.

felixcheung · 2017-04-23T00:23:40Z

+#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.


ditto below

felixcheung · 2017-04-23T00:53:39Z

+setMethod("cube",
+          signature(x = "SparkDataFrame"),
+          function(x, ...) {
+            cols <- list(...)


check length of cols is > 0?

If think we can skip that. rollup(df) and cube(df) are valid function calls equivalent to group_by(df) and arguably can be useful in some cases (like aggregations based on user input).

hmm, it's a bit odd to call rollup or cube that way but ok if other languages leave that open too. but I'd say we should add a line to explain "rollup or cube without column is the same as group_by" (or something better)

if you want to support empty parameter let's add some tests for it then?

felixcheung · 2017-04-23T00:53:55Z

+setMethod("rollup",
+          signature(x = "SparkDataFrame"),
+          function(x, ...) {
+            cols <- list(...)


check length of cols

felixcheung · 2017-04-23T00:54:58Z

+          signature(x = "SparkDataFrame"),
+          function(x, ...) {
+            cols <- list(...)
+            jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)


nit: I'd flip this since Column is a stronger type, and also this way there is a nicer error message
instead of if (is.character(x)) column(x)@jc else x@jc
do if (class(x) == "Column") x@jc else column(x)@jc

felixcheung · 2017-04-23T00:56:13Z

+          signature(x = "SparkDataFrame"),
+          function(x, ...) {
+            cols <- list(...)
+            jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)


ditto if (class(x) == "Column") x@jc else column(x)@jc

felixcheung · 2017-04-23T00:56:34Z

+#' @rdname rollup
+#' @export
+setGeneric("rollup",
+           function(x, ...) { standardGeneric("rollup") })


could you keep this in one line please

felixcheung · 2017-04-23T00:58:23Z

 head(numCyl)
 ```

+`groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions.


minor: I wouldn't say replace because they are not functionally the same?
how about use cube or rollup to compute subtotals across multiple dimensions.

do you think the programming guide can use updates too?

I keep forgetting there is one. I think we can add a few lines. This is actually a pretty neat feature.

SparkQA · 2017-04-24T01:56:08Z

Test build #76090 has finished for PR 17728 at commit a320327.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

and #17728 (comment)

felixcheung · 2017-04-24T17:47:39Z

+setMethod("cube",
+          signature(x = "SparkDataFrame"),
+          function(x, ...) {
+            cols <- list(...)


if you want to support empty parameter let's add some tests for it then?

SparkQA · 2017-04-24T20:56:25Z

Test build #76117 has finished for PR 17728 at commit d73b7e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-24T22:07:40Z

Test build #76119 has finished for PR 17728 at commit 7190fcd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-04-25T07:21:19Z

looks good,
if nit picking maybe we have just a bit "too much information" in the vignettes? perhaps cube without column should be in the "details" portion of the API doc here - I don't think we want to have too much details in either vignettes or programming guide to avoid confusion.

what do you think?

zero323 · 2017-04-25T13:56:22Z

I actually. I wasn't sure about this in the first place. Let me spread it between details and examples.

SparkQA · 2017-04-25T15:06:09Z

Test build #76135 has finished for PR 17728 at commit ee73dd8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-04-25T18:11:40Z

 #'
 #' @param x a SparkDataFrame.
-#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @param ...  character name(s) or Column(s) to group on.


extra space? ... character (two spaces)

felixcheung · 2017-04-25T18:46:18Z

            df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
            dataFrame(df)
          })
+


nit: extra newline

felixcheung · 2017-04-25T18:46:56Z

+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns.
+#'
+#' If grouping expression is missing `cube` creates a single global aggregate and is equivalent to


the backtick doesn't work with roxygen2 - if you want, use \code{cube} instead

felixcheung · 2017-04-25T18:47:15Z

+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns.
+#'
+#' If grouping expression is missing `rollup` creates a single global aggregate and is equivalent to


ditto on backtick rollup

SparkQA · 2017-04-25T19:52:45Z

Test build #76147 has finished for PR 17728 at commit c3ebeba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-26T00:22:24Z

Test build #76155 has finished for PR 17728 at commit 0da03b2.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2017-04-26T00:23:43Z

Jenkins retest this please.

SparkQA · 2017-04-26T01:12:37Z

Test build #76158 has finished for PR 17728 at commit 0da03b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

LGTM

felixcheung · 2017-04-26T05:01:11Z

merged to master

zero323 force-pushed the SPARK-20437 branch from e763aa6 to ae7512a Compare April 22, 2017 14:48

zero323 force-pushed the SPARK-20437 branch from b2a30f4 to 3104eb1 Compare April 22, 2017 15:34

zero323 added 4 commits April 22, 2017 21:25

Initial implementation

dcc359f

Add description to the vignette

bc0401b

Fix tests style

7af59e3

Add mising examples

132099c

zero323 force-pushed the SPARK-20437 branch from 3104eb1 to 132099c Compare April 22, 2017 19:25

felixcheung requested changes Apr 23, 2017

View reviewed changes

zero323 added 4 commits April 23, 2017 23:08

Place rollup generic in a single line

9760239

Add missing line at the end of the DataFrame.R

396cf55

Adjust ellipsis description

ab05919

Replace is.character check with class(x) == "Column"

a320327

felixcheung requested changes Apr 24, 2017

View reviewed changes

zero323 added 2 commits April 24, 2017 21:50

Add tests for cube and rollup without groupings

f4fa32f

Clarify vignette description

caeafdb

zero323 added 4 commits April 25, 2017 15:59

Extend R programming guide with cube and rollup

e9bbe6f

Describe behavior with missing grouping columns

7d6c6d5

Use seealso to link groupBy, cube and rollup and agg

76f12cd

Makge groupBy ... description consistent with cube and rollup

ee73dd8

zero323 force-pushed the SPARK-20437 branch from 7190fcd to ee73dd8 Compare April 25, 2017 14:31

felixcheung reviewed Apr 25, 2017

View reviewed changes

Remove extra whitespaces and replace backticks

0da03b2

zero323 force-pushed the SPARK-20437 branch from c3ebeba to 0da03b2 Compare April 25, 2017 23:39

felixcheung approved these changes Apr 26, 2017

View reviewed changes

asfgit closed this in df58a95 Apr 26, 2017

zero323 deleted the SPARK-20437 branch April 26, 2017 13:17

Conversation

zero323 commented Apr 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

zero323 commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

zero323 commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

SparkQA commented Apr 22, 2017

Uh oh!

zero323 commented Apr 22, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

felixcheung commented Apr 25, 2017

Uh oh!

zero323 commented Apr 25, 2017

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zero323 commented Apr 22, 2017 •

edited

Loading

felixcheung Apr 25, 2017 •

edited

Loading