Skip to content
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 41 additions & 50 deletions R/pkg/R/mllib.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,10 @@ setClass("AFTSurvivalRegressionModel", representation(jobj = "jobj"))
#' @note KMeansModel since 2.0.0
setClass("KMeansModel", representation(jobj = "jobj"))

#' Fits a generalized linear model
#' Generalized Linear Models
#'
#' Fits a generalized linear model against a Spark DataFrame.
#' Fits generalized linear model against a Spark DataFrame. Users can print, make predictions on the
#' produced model and save the model to the input path.
#'
#' @param data SparkDataFrame for training.
#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
Expand All @@ -66,8 +67,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
#' @param tol Positive convergence tolerance of iterations.
#' @param maxIter Integer giving the maximal number of IRLS iterations.
#' @return a fitted generalized linear model
#' @return \code{spark.glm} returns a fitted generalized linear model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the page for spark.glm I think it's ok if it simply says returns a fitted...
it seems a bit odd to reference spark.glm again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd view spark.glm as a method under this group. It might be clearer to have the method name in the Value section. Also, I'm not sure about how roxygen2 defines the ordering. Having the names would help.

#' @rdname spark.glm
#' @name spark.glm
#' @export
#' @examples
#' \dontrun{
Expand All @@ -76,8 +78,21 @@ setClass("KMeansModel", representation(jobj = "jobj"))
#' df <- createDataFrame(iris)
#' model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
#' summary(model)
#'
#' # fitted values on training data
#' fitted <- predict(model, df)
#' head(select(fitted, "Sepal_Length", "prediction"))
#'
#' # save fitted model to input path
#' path <- "path/to/model"
#' write.ml(model, path)
#'
#' # can also read back the saved model and print
#' savedModel <- read.ml(path)
#' summary(savedModel)
#' }
#' @note spark.glm since 2.0.0
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25) {
if (is.character(family)) {
Expand All @@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
return(new("GeneralizedLinearRegressionModel", jobj = jobj))
})

#' Fits a generalized linear model (R-compliant).
#'
#' Fits a generalized linear model, similarly to R's glm().
#'
#' @title Generalized Linear Models (R-compliant)
Copy link
Contributor

@vectorijk vectorijk Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As talked with this discussion #13394 (diff), should we follow first sentence as the convention to define the title in this case?

Copy link
Contributor

@vectorijk vectorijk Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also title changes below? @mengxr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @junyangq to try changing #' to # and see whether it works. We still need a comment for the function but we don't want it to appear on the generated Rd doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyangq Could you remove @title? We might need an empty line after this line. See attached screenshot:

screen shot 2016-06-21 at 8 26 07 pm

# Fits a generalized linear model, similarly to R's glm().
#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
#' operators are supported, including '~', '.', ':', '+', and '-'.
#' @param data SparkDataFrame for training.
Expand All @@ -112,7 +125,7 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a see also and link it to spark.glm.

#' @param epsilon Positive convergence tolerance of iterations.
#' @param maxit Integer giving the maximal number of IRLS iterations.
#' @return a fitted generalized linear model
#' @return \code{glm} returns a fitted generalized linear model.
#' @rdname glm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to put glm in a separate Rd file. But we shouldn't remove the example code.

#' @export
#' @examples
Expand All @@ -124,24 +137,21 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
#' summary(model)
#' }
#' @note glm since 1.5.0
#' @seealso \link{spark.glm}
setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDataFrame"),
function(formula, family = gaussian, data, epsilon = 1e-6, maxit = 25) {
spark.glm(data, formula, family, tol = epsilon, maxIter = maxit)
})

#' Get the summary of a generalized linear model
#'
#' Returns the summary of a model produced by glm() or spark.glm(), similarly to R's summary().
#'
#' @title Summary of GLM model
# Returns the summary of a model produced by glm() or spark.glm(), similarly to R's summary().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert an empty line

Copy link
Member

@felixcheung felixcheung Jun 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this is intentional, but I'd suggest adding extra empty newline between # & #' line - it's very easy to newcomers to make mistakes copy/paste lines (and didn't realize it wouldn't output to the doc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

#' @param object A fitted generalized linear model
#' @return coefficients the model's coefficients, intercept
#' @rdname summary
#' @return \code{summary} returns a summary object of the fitted model, a list of components
#' including at least the coefficients, null/residual deviance, null/residual degrees
#' of freedom, AIC and number of iterations IRLS takes.
#'
#' @rdname spark.glm
#' @export
#' @examples
#' \dontrun{
#' model <- glm(y ~ x, trainingData)
#' summary(model)
#' }
#' @note summary(GeneralizedLinearRegressionModel) since 2.0.0
setMethod("summary", signature(object = "GeneralizedLinearRegressionModel"),
function(object, ...) {
Expand Down Expand Up @@ -173,10 +183,10 @@ setMethod("summary", signature(object = "GeneralizedLinearRegressionModel"),
return(ans)
})

#' Print the summary of GeneralizedLinearRegressionModel
#'
#' @rdname print
#' @name print.summary.GeneralizedLinearRegressionModel
#' @title Print summary of GLM model
# Print the summary of GeneralizedLinearRegressionModel
Copy link
Contributor

@mengxr mengxr Jun 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Prints
  • insert an empty line

#' @rdname spark.glm
#' @param x Summary object of fitted generalized linear model returned by \code{summary} function
#' @export
#' @note print.summary.GeneralizedLinearRegressionModel since 2.0.0
print.summary.GeneralizedLinearRegressionModel <- function(x, ...) {
Expand Down Expand Up @@ -205,22 +215,13 @@ print.summary.GeneralizedLinearRegressionModel <- function(x, ...) {
invisible(x)
}

#' Predicted values based on model
#'
#' Makes predictions from a generalized linear model produced by glm() or spark.glm(),
#' similarly to R's predict().
#'
#' @param object A fitted generalized linear model
#' @title Make predictions using the produced generalized linear model
# Makes predictions from a generalized linear model produced by glm() or spark.glm(),
# similarly to R's predict().
#' @param newData SparkDataFrame for testing
#' @return SparkDataFrame containing predicted labels in a column named "prediction"
#' @rdname predict
#' @return \code{predict} returns a SparkDataFrame containing predicted labels in a column named "prediction"
#' @rdname spark.glm
#' @export
#' @examples
#' \dontrun{
#' model <- glm(y ~ x, trainingData)
#' predicted <- predict(model, testData)
#' showDF(predicted)
#' }
#' @note predict(GeneralizedLinearRegressionModel) since 1.5.0
setMethod("predict", signature(object = "GeneralizedLinearRegressionModel"),
function(object, newData) {
Expand Down Expand Up @@ -471,24 +472,14 @@ setMethod("write.ml", signature(object = "AFTSurvivalRegressionModel", path = "c
invisible(callJMethod(writer, "save", path))
})

#' Save fitted MLlib model to the input path
#'
#' Save the generalized linear model to the input path.
#'
#' @param object A fitted generalized linear model
#' @title Save fitted GLM model
# Save the generalized linear model to the input path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Saves
  • insert an empty line

#' @param path The directory where the model is saved
#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE
#' which means throw exception if the output path exists.
#'
#' @rdname write.ml
#' @name write.ml
#' @rdname spark.glm
#' @export
#' @examples
#' \dontrun{
#' model <- glm(y ~ x, trainingData)
#' path <- "path/to/model"
#' write.ml(model, path)
#' }
#' @note write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
setMethod("write.ml", signature(object = "GeneralizedLinearRegressionModel", path = "character"),
function(object, path, overwrite = FALSE) {
Expand Down