Skip to content

Conversation

@tengpeng
Copy link
Contributor

@tengpeng tengpeng commented Apr 23, 2018

What changes were proposed in this pull request?

It is reported by Spark users that the deviance calculation for poisson regression does not handle y = 0. Thus, the correct model summary cannot be obtained. The user has confirmed the the issue is in

override def deviance(y: Double, mu: Double, weight: Double): Double =
{ 2.0 * weight * (y * math.log(y / mu) - (y - mu)) }
when y = 0.

The user also mentioned there are many other places he believe we should check the same thing. However, no other changes are needed, including Gamma distribution.

How was this patch tested?

Add a comparison with R deviance calculation to the existing unit test.

@dbtsai
Copy link
Member

dbtsai commented Apr 23, 2018

ok to test

Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only couple small comments, and we're ready to merge it once they're resolved.

Thanks.

DB Tsai | Siri Open Source Technologies |  Apple, Inc

private def ylogy(y: Double, mu: Double): Double = {
if (y == 0) 0.0 else y * math.log(y / mu)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another ylogy implementation in Binomial. Can you move this code to object GeneralizedLinearRegression and make it private to this package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the quick review. I have moved the ylog implementation to object GeneralizedLinearRegression. One quick question here: I am not sure I have fully understood why this is the right place for ylog? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion to avoid the duplicated code? Let's followup this later if you have an idea.

print(as.vector(coef(model)))
}
[1] -0.0457441 -0.6833928
[1] 1.8121235 -0.1747493 -0.5815417
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the R script which generate the deviance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. The updated script is sufficient to calculate deviance on its own.

Vectors.dense(1.8121235, -0.1747493, -0.5815417))
Vectors.dense(0.0, -0.0457441, -0.6833928, 3.8093),
Vectors.dense(1.8121235, -0.1747493, -0.5815417, 3.7006))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding them to expected is not consistent to the rest of the test code.

How about

val residualDeviancesR = Array(3.8093, 3.7006)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Thanks!

val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1),
model.summary.deviance)
assert(actual ~= expected(idx) absTol 1e-4, "Model mismatch: GLM with poisson family, " +
s"$link link and fitIntercept = $fitIntercept (with zero values).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(model.summary.deviance ~== residualDeviancesR(idx) absTol 1E-3)

@SparkQA
Copy link

SparkQA commented Apr 23, 2018

Test build #89699 has finished for PR 21125 at commit 3c6a4da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tengpeng tengpeng changed the title [Spark-24024] Fix poisson deviance calculations in GLM to handle y = 0 [Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 Apr 23, 2018
@SparkQA
Copy link

SparkQA commented Apr 23, 2018

Test build #89723 has finished for PR 21125 at commit da53b1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 293a0f2 Apr 23, 2018
@dbtsai
Copy link
Member

dbtsai commented Apr 23, 2018

LGTM, merged into master. Thanks.

DB Tsai | Siri Open Source Technologies |  Apple, Inc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants