Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Oct 27, 2015

Compute sigma pseudo-inverse without square root to avoid precision problems

CC @mengxr @jkbradley

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #1956 has finished for PR 9293 at commit b375ce5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the test data, it's obviously constructed so that the first 2 points cluster together and the other 3 cluster together. I verified this is what Mclust gives in R as well.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44436 has finished for PR 9293 at commit fe431ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the old formula missing the following:

val v = pinvS * u
(v.t * v, ...)

I think using the root inverse should be cheaper and more accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's equivalent numerically, and would require the other changes above. I'm not clear why it's better to do it this way though? it takes longer to take the square root of the eigenvalues, and then they're just multiplied back together. It's the same number of operations here and above otherwise.

I think the evidence that it's not accurate enough is the case in the JIRA and tests here, and also the Pyspark test that is wrong at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra matrix-matrix multiplication in u * pinvS * u.t. I think the bug is in line 133, where we should use pinvS * u.t instead of pinvS * u. Could you check this solution? Some comments need updates too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding v.t * v also puts in an extra matrix-matrix multiply. But yes I see your point that u.t alone was the likely original bug. If that fixes it, it's a simpler change and yes that does cost one less matrix multiply. Have a look at #9309

asfgit pushed a commit that referenced this pull request Oct 28, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes #9293

Author: Sean Owen <[email protected]>

Closes #9309 from srowen/SPARK-11302.2.

(cherry picked from commit 826e1e3)
Signed-off-by: Xiangrui Meng <[email protected]>
asfgit pushed a commit that referenced this pull request Oct 28, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes #9293

Author: Sean Owen <[email protected]>

Closes #9309 from srowen/SPARK-11302.2.

(cherry picked from commit 826e1e3)
Signed-off-by: Xiangrui Meng <[email protected]>
asfgit pushed a commit that referenced this pull request Oct 28, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes #9293

Author: Sean Owen <[email protected]>

Closes #9309 from srowen/SPARK-11302.2.
asfgit pushed a commit that referenced this pull request Oct 28, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes #9293

Author: Sean Owen <[email protected]>

Closes #9309 from srowen/SPARK-11302.2.

(cherry picked from commit 826e1e3)
Signed-off-by: Xiangrui Meng <[email protected]>
@srowen srowen closed this Oct 28, 2015
@srowen srowen deleted the SPARK-11302 branch October 29, 2015 14:10
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 3, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes apache#9293

Author: Sean Owen <[email protected]>

Closes apache#9309 from srowen/SPARK-11302.2.

(cherry picked from commit 826e1e3)
Signed-off-by: Xiangrui Meng <[email protected]>
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 3, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes apache#9293

Author: Sean Owen <[email protected]>

Closes apache#9309 from srowen/SPARK-11302.2.

(cherry picked from commit 826e1e3)
Signed-off-by: Xiangrui Meng <[email protected]>
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…atrix returns incorrect answer in some cases

Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes apache/spark#9293

Author: Sean Owen <[email protected]>

Closes #9309 from srowen/SPARK-11302.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants