[Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 #21125

tengpeng · 2018-04-23T02:32:25Z

What changes were proposed in this pull request?

It is reported by Spark users that the deviance calculation for poisson regression does not handle y = 0. Thus, the correct model summary cannot be obtained. The user has confirmed the the issue is in

override def deviance(y: Double, mu: Double, weight: Double): Double =
{ 2.0 * weight * (y * math.log(y / mu) - (y - mu)) }
when y = 0.

The user also mentioned there are many other places he believe we should check the same thing. However, no other changes are needed, including Gamma distribution.

How was this patch tested?

Add a comparison with R deviance calculation to the existing unit test.

dbtsai · 2018-04-23T05:33:23Z

ok to test

dbtsai

Only couple small comments, and we're ready to merge it once they're resolved.

Thanks.

DB Tsai | Siri Open Source Technologies |  Apple, Inc

dbtsai · 2018-04-23T05:53:12Z

mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala

+    private def ylogy(y: Double, mu: Double): Double = {
+      if (y == 0) 0.0 else y * math.log(y / mu)
+    }
+


Another ylogy implementation in Binomial. Can you move this code to object GeneralizedLinearRegression and make it private to this package?

Thanks so much for the quick review. I have moved the ylog implementation to object GeneralizedLinearRegression. One quick question here: I am not sure I have fully understood why this is the right place for ylog? Thanks!

Any suggestion to avoid the duplicated code? Let's followup this later if you have an idea.

dbtsai · 2018-04-23T05:56:11Z

mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala

         print(as.vector(coef(model)))
       }
       [1] -0.0457441 -0.6833928
       [1] 1.8121235  -0.1747493  -0.5815417


Can you update the R script which generate the deviance?

Updated. The updated script is sufficient to calculate deviance on its own.

dbtsai · 2018-04-23T06:01:42Z

mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala

-      Vectors.dense(1.8121235, -0.1747493, -0.5815417))
+      Vectors.dense(0.0, -0.0457441, -0.6833928, 3.8093),
+      Vectors.dense(1.8121235, -0.1747493, -0.5815417, 3.7006))



Adding them to expected is not consistent to the rest of the test code.

How about

val residualDeviancesR = Array(3.8093, 3.7006)

Modified. Thanks!

dbtsai · 2018-04-23T06:01:59Z

mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala

+      val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1),
+        model.summary.deviance)
      assert(actual ~= expected(idx) absTol 1e-4, "Model mismatch: GLM with poisson family, " +
        s"$link link and fitIntercept = $fitIntercept (with zero values).")


assert(model.summary.deviance ~== residualDeviancesR(idx) absTol 1E-3)

SparkQA · 2018-04-23T06:45:24Z

Test build #89699 has finished for PR 21125 at commit 3c6a4da.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-23T14:10:02Z

Test build #89723 has finished for PR 21125 at commit da53b1a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2018-04-23T17:59:28Z

LGTM, merged into master. Thanks.

DB Tsai | Siri Open Source Technologies |  Apple, Inc

fix deviance calculation when y = 0

3c6a4da

dbtsai reviewed Apr 23, 2018

View reviewed changes

Address comments

da53b1a

tengpeng changed the title ~~[Spark-24024] Fix poisson deviance calculations in GLM to handle y = 0~~ [Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 Apr 23, 2018

srowen approved these changes Apr 23, 2018

View reviewed changes

asfgit closed this in 293a0f2 Apr 23, 2018

[Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 #21125

[Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 #21125

Uh oh!

Conversation

tengpeng commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dbtsai commented Apr 23, 2018

Uh oh!

dbtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 23, 2018

Uh oh!

SparkQA commented Apr 23, 2018

Uh oh!

dbtsai commented Apr 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tengpeng commented Apr 23, 2018 •

edited

Loading