Skip to content

[SPARK-20258][Doc][SparkR] Fix SparkR logistic regression example in programming guide (did not converge) #17571

Closed
actuaryzhang wants to merge 2 commits intoapache:masterfrom
actuaryzhang:programGuide2
Closed

[SPARK-20258][Doc][SparkR] Fix SparkR logistic regression example in programming guide (did not converge) #17571
actuaryzhang wants to merge 2 commits intoapache:masterfrom
actuaryzhang:programGuide2

Conversation

@actuaryzhang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

SparkR logistic regression example did not converge in programming guide (for IRWLS). All estimates are essentially zero:

training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
df_list2 <- randomSplit(training2, c(7,3), 2)
binomialDF <- df_list2[[1]]
binomialTestDF <- df_list2[[2]]
binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial")

17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to singular covariance matrix. Retrying with Quasi-Newton solver.

> summary(binomialGLM)

Coefficients:
                 Estimate
(Intercept)    9.0255e+00
features_0     0.0000e+00
features_1     0.0000e+00
features_2     0.0000e+00
features_3     0.0000e+00
features_4     0.0000e+00
features_5     0.0000e+00
features_6     0.0000e+00
features_7     0.0000e+00

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

summary(gaussianGLM2)

# Fit a generalized linear model of family "binomial" with spark.glm
training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an issue with binary_classification_data data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a bad example.

@actuaryzhang
Copy link
Copy Markdown
Contributor Author

@felixcheung
Just noticed that the current example for logistic regression in the programming guide did not seem to be a good one. It did not converge using IRWLS, and Quasi-Newton yielded almost zero estimates for all coefficients.

Comment thread examples/src/main/r/ml/glm.R Outdated
# Fit a generalized linear model of family "binomial" with spark.glm
training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
df_list2 <- randomSplit(training2, c(7,3), 2)
training2 <- read.df("/data/mllib/sample_multiclass_classification_data.txt", source = "libsvm")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you might need to leave it as relative path, ie. not starting with /

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! copy paste error. Corrected now.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 7, 2017

Test build #75613 has finished for PR 17571 at commit 95b5383.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Apr 7, 2017

Test build #75615 has finished for PR 17571 at commit f7e71ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Copy Markdown
Member

right, tests don't run example anyway...

@felixcheung
Copy link
Copy Markdown
Member

merged to master.

@asfgit asfgit closed this in 1ad73f0 Apr 7, 2017
@actuaryzhang actuaryzhang deleted the programGuide2 branch April 7, 2017 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants