Skip to content

[SPARK-20437][R] R wrappers for rollup and cube#17728

Closed
zero323 wants to merge 15 commits intoapache:masterfrom
zero323:SPARK-20437
Closed

[SPARK-20437][R] R wrappers for rollup and cube#17728
zero323 wants to merge 15 commits intoapache:masterfrom
zero323:SPARK-20437

Conversation

@zero323
Copy link
Copy Markdown
Member

@zero323 zero323 commented Apr 22, 2017

What changes were proposed in this pull request?

  • Add rollup and cube methods and corresponding generics.
  • Add short description to the vignette.

How was this patch tested?

  • Existing unit tests.
  • Additional unit tests covering new features.
  • check-cran.sh.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76062 has finished for PR 17728 at commit c1459d0.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76063 has finished for PR 17728 at commit e763aa6.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Copy Markdown
Member Author

zero323 commented Apr 22, 2017

Jenkins retest this please.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76064 has finished for PR 17728 at commit ae7512a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76065 has finished for PR 17728 at commit ae7512a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76066 has finished for PR 17728 at commit 3104eb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Copy Markdown
Member Author

zero323 commented Apr 22, 2017

Jenkins retest this please.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76067 has finished for PR 17728 at commit 3104eb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 22, 2017

Test build #76070 has finished for PR 17728 at commit 132099c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Copy Markdown
Member Author

zero323 commented Apr 22, 2017

cc @felixcheung

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Comment thread R/pkg/R/DataFrame.R Outdated
jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)
sgd <- callJMethod(x@sdf, "rollup", jcol)
groupedData(sgd)
}) No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add extra newline at end of file

Comment thread R/pkg/R/DataFrame.R Outdated
#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns.
#'
#' @param x a SparkDataFrame.
#' @param ... variable(s) (character names(s) or Column(s)) to group on.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names(s) -> name(s)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps variable(s) is misleading and just character name(s) or Column(s) to group on. is sufficient?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment thread R/pkg/R/DataFrame.R Outdated
#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns.
#'
#' @param x a SparkDataFrame.
#' @param ... variable(s) (character names(s) or Column(s)) to group on.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto below

Comment thread R/pkg/R/DataFrame.R
setMethod("cube",
signature(x = "SparkDataFrame"),
function(x, ...) {
cols <- list(...)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check length of cols is > 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If think we can skip that. rollup(df) and cube(df) are valid function calls equivalent to group_by(df) and arguably can be useful in some cases (like aggregations based on user input).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it's a bit odd to call rollup or cube that way but ok if other languages leave that open too. but I'd say we should add a line to explain "rollup or cube without column is the same as group_by" (or something better)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to support empty parameter let's add some tests for it then?

Comment thread R/pkg/R/DataFrame.R
setMethod("rollup",
signature(x = "SparkDataFrame"),
function(x, ...) {
cols <- list(...)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check length of cols

Comment thread R/pkg/R/DataFrame.R Outdated
signature(x = "SparkDataFrame"),
function(x, ...) {
cols <- list(...)
jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd flip this since Column is a stronger type, and also this way there is a nicer error message
instead of if (is.character(x)) column(x)@jc else x@jc
do if (class(x) == "Column") x@jc else column(x)@jc

Comment thread R/pkg/R/DataFrame.R Outdated
signature(x = "SparkDataFrame"),
function(x, ...) {
cols <- list(...)
jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto if (class(x) == "Column") x@jc else column(x)@jc

Comment thread R/pkg/R/generics.R Outdated
#' @rdname rollup
#' @export
setGeneric("rollup",
function(x, ...) { standardGeneric("rollup") })
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you keep this in one line please

Comment thread R/pkg/vignettes/sparkr-vignettes.Rmd Outdated
head(numCyl)
```

`groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I wouldn't say replace because they are not functionally the same?
how about use cube or rollup to compute subtotals across multiple dimensions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think the programming guide can use updates too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep forgetting there is one. I think we can add a few lines. This is actually a pretty neat feature.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 24, 2017

Test build #76090 has finished for PR 17728 at commit a320327.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread R/pkg/R/DataFrame.R
setMethod("cube",
signature(x = "SparkDataFrame"),
function(x, ...) {
cols <- list(...)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to support empty parameter let's add some tests for it then?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 24, 2017

Test build #76117 has finished for PR 17728 at commit d73b7e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 24, 2017

Test build #76119 has finished for PR 17728 at commit 7190fcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Copy Markdown
Member

looks good,
if nit picking maybe we have just a bit "too much information" in the vignettes? perhaps cube without column should be in the "details" portion of the API doc here - I don't think we want to have too much details in either vignettes or programming guide to avoid confusion.

what do you think?

@zero323
Copy link
Copy Markdown
Member Author

zero323 commented Apr 25, 2017

I actually. I wasn't sure about this in the first place. Let me spread it between details and examples.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 25, 2017

Test build #76135 has finished for PR 17728 at commit ee73dd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread R/pkg/R/DataFrame.R Outdated
#'
#' @param x a SparkDataFrame.
#' @param ... variable(s) (character names(s) or Column(s)) to group on.
#' @param ... character name(s) or Column(s) to group on.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space? ... character (two spaces)

Comment thread R/pkg/R/DataFrame.R
df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
dataFrame(df)
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline

Comment thread R/pkg/R/DataFrame.R Outdated
#'
#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns.
#'
#' If grouping expression is missing `cube` creates a single global aggregate and is equivalent to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the backtick doesn't work with roxygen2 - if you want, use \code{cube} instead

Comment thread R/pkg/R/DataFrame.R Outdated
#'
#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns.
#'
#' If grouping expression is missing `rollup` creates a single global aggregate and is equivalent to
Copy link
Copy Markdown
Member

@felixcheung felixcheung Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on backtick rollup

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 25, 2017

Test build #76147 has finished for PR 17728 at commit c3ebeba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 26, 2017

Test build #76155 has finished for PR 17728 at commit 0da03b2.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Copy Markdown
Member Author

zero323 commented Apr 26, 2017

Jenkins retest this please.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 26, 2017

Test build #76158 has finished for PR 17728 at commit 0da03b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixcheung
Copy link
Copy Markdown
Member

merged to master

@asfgit asfgit closed this in df58a95 Apr 26, 2017
@zero323 zero323 deleted the SPARK-20437 branch April 26, 2017 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants