-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-20437][R] R wrappers for rollup and cube #17728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
dcc359f
bc0401b
7af59e3
132099c
9760239
396cf55
ab05919
a320327
f4fa32f
caeafdb
e9bbe6f
7d6c6d5
76f12cd
ee73dd8
0da03b2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1321,7 +1321,7 @@ setMethod("toRDD", | |
| #' Groups the SparkDataFrame using the specified columns, so we can run aggregation on them. | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... variable(s) (character names(s) or Column(s)) to group on. | ||
| #' @param ... character name(s) or Column(s) to group on. | ||
| #' @return A GroupedData. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases groupBy,SparkDataFrame-method | ||
|
|
@@ -1337,6 +1337,7 @@ setMethod("toRDD", | |
| #' agg(groupBy(df, "department", "gender"), salary="avg", "age" -> "max") | ||
| #' } | ||
| #' @note groupBy since 1.4.0 | ||
| #' @seealso \link{agg}, \link{cube}, \link{rollup} | ||
| setMethod("groupBy", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
|
|
@@ -3642,3 +3643,74 @@ setMethod("checkpoint", | |
| df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) | ||
| dataFrame(df) | ||
| }) | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: extra newline |
||
|
|
||
| #' cube | ||
| #' | ||
| #' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' If grouping expression is missing `cube` creates a single global aggregate and is equivalent to | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the backtick doesn't work with roxygen2 - if you want, use |
||
| #' direct application of \link{agg}. | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... character name(s) or Column(s) to group on. | ||
| #' @return A GroupedData. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases cube,SparkDataFrame-method | ||
| #' @rdname cube | ||
| #' @name cube | ||
| #' @export | ||
| #' @examples | ||
| #' \dontrun{ | ||
| #' df <- createDataFrame(mtcars) | ||
| #' mean(cube(df, "cyl", "gear", "am"), "mpg") | ||
| #' | ||
| #' # Following calls are equivalent | ||
| #' agg(cube(carsDF), mean(carsDF$mpg)) | ||
| #' agg(carsDF, mean(carsDF$mpg)) | ||
| #' } | ||
| #' @note cube since 2.3.0 | ||
| #' @seealso \link{agg}, \link{groupBy}, \link{rollup} | ||
| setMethod("cube", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. check length of cols is > 0?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If think we can skip that.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, it's a bit odd to call rollup or cube that way but ok if other languages leave that open too. but I'd say we should add a line to explain "rollup or cube without column is the same as group_by" (or something better)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you want to support empty parameter let's add some tests for it then? |
||
| jcol <- lapply(cols, function(x) if (class(x) == "Column") x@jc else column(x)@jc) | ||
| sgd <- callJMethod(x@sdf, "cube", jcol) | ||
| groupedData(sgd) | ||
| }) | ||
|
|
||
| #' rollup | ||
| #' | ||
| #' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. | ||
| #' | ||
| #' If grouping expression is missing `rollup` creates a single global aggregate and is equivalent to | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto on backtick |
||
| #' direct application of \link{agg}. | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... character name(s) or Column(s) to group on. | ||
| #' @return A GroupedData. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases rollup,SparkDataFrame-method | ||
| #' @rdname rollup | ||
| #' @name rollup | ||
| #' @export | ||
| #' @examples | ||
| #'\dontrun{ | ||
| #' df <- createDataFrame(mtcars) | ||
| #' mean(rollup(df, "cyl", "gear", "am"), "mpg") | ||
| #' | ||
| #' # Following calls are equivalent | ||
| #' agg(rollup(carsDF), mean(carsDF$mpg)) | ||
| #' agg(carsDF, mean(carsDF$mpg)) | ||
| #' } | ||
| #' @note rollup since 2.3.0 | ||
| #' @seealso \link{agg}, \link{cube}, \link{groupBy} | ||
| setMethod("rollup", | ||
| signature(x = "SparkDataFrame"), | ||
| function(x, ...) { | ||
| cols <- list(...) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. check length of cols |
||
| jcol <- lapply(cols, function(x) if (class(x) == "Column") x@jc else column(x)@jc) | ||
| sgd <- callJMethod(x@sdf, "rollup", jcol) | ||
| groupedData(sgd) | ||
| }) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space?
... character(two spaces)