[SPARK-20437][R] R wrappers for rollup and cube#17728
[SPARK-20437][R] R wrappers for rollup and cube#17728zero323 wants to merge 15 commits intoapache:masterfrom
Conversation
|
Test build #76062 has finished for PR 17728 at commit
|
|
Test build #76063 has finished for PR 17728 at commit
|
|
Jenkins retest this please. |
|
Test build #76064 has finished for PR 17728 at commit
|
|
Test build #76065 has finished for PR 17728 at commit
|
|
Test build #76066 has finished for PR 17728 at commit
|
|
Jenkins retest this please. |
|
Test build #76067 has finished for PR 17728 at commit
|
|
Test build #76070 has finished for PR 17728 at commit
|
|
cc @felixcheung |
| jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) | ||
| sgd <- callJMethod(x@sdf, "rollup", jcol) | ||
| groupedData(sgd) | ||
| }) No newline at end of file |
There was a problem hiding this comment.
please add extra newline at end of file
| #' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... variable(s) (character names(s) or Column(s)) to group on. |
There was a problem hiding this comment.
perhaps variable(s) is misleading and just character name(s) or Column(s) to group on. is sufficient?
| #' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... variable(s) (character names(s) or Column(s)) to group on. |
| setMethod("cube", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) |
There was a problem hiding this comment.
check length of cols is > 0?
There was a problem hiding this comment.
If think we can skip that. rollup(df) and cube(df) are valid function calls equivalent to group_by(df) and arguably can be useful in some cases (like aggregations based on user input).
There was a problem hiding this comment.
hmm, it's a bit odd to call rollup or cube that way but ok if other languages leave that open too. but I'd say we should add a line to explain "rollup or cube without column is the same as group_by" (or something better)
There was a problem hiding this comment.
if you want to support empty parameter let's add some tests for it then?
| setMethod("rollup", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) |
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) | ||
| jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) |
There was a problem hiding this comment.
nit: I'd flip this since Column is a stronger type, and also this way there is a nicer error message
instead of if (is.character(x)) column(x)@jc else x@jc
do if (class(x) == "Column") x@jc else column(x)@jc
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) | ||
| jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) |
There was a problem hiding this comment.
ditto if (class(x) == "Column") x@jc else column(x)@jc
| #' @rdname rollup | ||
| #' @export | ||
| setGeneric("rollup", | ||
| function(x, ...) { standardGeneric("rollup") }) |
There was a problem hiding this comment.
could you keep this in one line please
| head(numCyl) | ||
| ``` | ||
|
|
||
| `groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions. |
There was a problem hiding this comment.
minor: I wouldn't say replace because they are not functionally the same?
how about use cube or rollup to compute subtotals across multiple dimensions.
There was a problem hiding this comment.
do you think the programming guide can use updates too?
There was a problem hiding this comment.
I keep forgetting there is one. I think we can add a few lines. This is actually a pretty neat feature.
|
Test build #76090 has finished for PR 17728 at commit
|
| setMethod("cube", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) |
There was a problem hiding this comment.
if you want to support empty parameter let's add some tests for it then?
|
Test build #76117 has finished for PR 17728 at commit
|
|
Test build #76119 has finished for PR 17728 at commit
|
|
looks good, what do you think? |
|
I actually. I wasn't sure about this in the first place. Let me spread it between |
|
Test build #76135 has finished for PR 17728 at commit
|
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... variable(s) (character names(s) or Column(s)) to group on. | ||
| #' @param ... character name(s) or Column(s) to group on. |
There was a problem hiding this comment.
extra space? ... character (two spaces)
| df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) | ||
| dataFrame(df) | ||
| }) | ||
|
|
| #' | ||
| #' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' If grouping expression is missing `cube` creates a single global aggregate and is equivalent to |
There was a problem hiding this comment.
the backtick doesn't work with roxygen2 - if you want, use \code{cube} instead
| #' | ||
| #' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' If grouping expression is missing `rollup` creates a single global aggregate and is equivalent to |
|
Test build #76147 has finished for PR 17728 at commit
|
|
Test build #76155 has finished for PR 17728 at commit
|
|
Jenkins retest this please. |
|
Test build #76158 has finished for PR 17728 at commit
|
|
merged to master |
What changes were proposed in this pull request?
rollupandcubemethods and corresponding generics.How was this patch tested?
check-cran.sh.