Implement random method for LKJCorr#2443
Implement random method for LKJCorr#2443junpenglao merged 13 commits intopymc-devs:masterfrom junpenglao:lkj_random
Conversation
using the algorithm in LKJ 2009(vine method based on a C-vine)
pymc3/distributions/multivariate.py
Outdated
| P = np.zeros((n, n)) # partial correlations | ||
| r_triu = [] | ||
|
|
||
| for k in range(n-1): |
There was a problem hiding this comment.
Is it possible to vectorize it?
There was a problem hiding this comment.
I can not see an easy way to do it, the step below converting the partial correlation to raw correlation is especially tricky.
|
Any suggestion of implementing a test? I am stuck. @ferrine @aseyboldt |
|
What about comparing kde with theoretical density? |
pymc3/distributions/multivariate.py
Outdated
| for k in range(n-1): | ||
| beta -= 1/2 | ||
| for i in range(k+1, n): | ||
| P[k, i] = stats.beta.rvs(a=eta, b=eta) # sampling from beta |
There was a problem hiding this comment.
You'd better create these samples beforehand
|
Or you could run a KS-test between true and sampled density. |
|
Thanks everyone! I think I finally figure it out ;-) |
pymc3/distributions/multivariate.py
Outdated
| for k, i in zip(triu_ind[0], triu_ind[1]): | ||
| p = P[k, i] | ||
| for l in range(k-1, -1, -1): # convert partial correlation to raw correlation | ||
| p = p * np.sqrt((1-P[l, i]**2)*(1-P[l, k]**2)) + P[l, i]*P[l, k] |
There was a problem hiding this comment.
pep8 wants white spaces around math operators
pymc3/distributions/multivariate.py
Outdated
| samples = generate_samples(stats.beta.rvs, eta, eta, | ||
| dist_shape=self.shape, | ||
| size=size) | ||
| samples = (samples-0.5)*2 |
pymc3/distributions/multivariate.py
Outdated
| for k, i in zip(triu_ind[0], triu_ind[1]): | ||
| p = P[k, i] | ||
| for l in range(k-1, -1, -1): # convert partial correlation to raw correlation | ||
| p = p * np.sqrt((1 - P[l, i]**2) * |
There was a problem hiding this comment.
what i am doing here is slow in a for loop, should I change it to reduce @ColCarroll?
There was a problem hiding this comment.
For stuff like this we could also consider to use numba if it is available. (check if it is available and create a no-op replacement if it is not).
| self.tri_index[np.triu_indices(n, k=1)] = np.arange(shape) | ||
| self.tri_index[np.triu_indices(n, k=1)[::-1]] = np.arange(shape) | ||
|
|
||
| def _random(self, n, eta, size=None): |
There was a problem hiding this comment.
seems like size argument is ignored
There was a problem hiding this comment.
Yep. I ignored the size as the _random method here can only generate 1 slide of the random matrix.
There was a problem hiding this comment.
Thanks for the suggestion - I manage to use the size and vectorized the _random method.
|
OK the implementation should be correct as the distribution of the matrix elements match import numpy as np
from scipy import stats
# cherry picked LKJ random function
def lkj_random(n, eta, size=None):
beta0 = eta - 1 + n/2
shape = n * (n-1) // 2
triu_ind = np.triu_indices(n, 1)
beta = np.array([beta0 - k/2 for k in triu_ind[0]])
# partial correlations sampled from beta dist.
P = np.ones((n, n) + (size,))
P[triu_ind] = stats.beta.rvs(a=beta, b=beta, size=(size,) + (shape,)).T
# scale partial correlation matrix to [-1, 1]
P = (P - .5) * 2
for k, i in zip(triu_ind[0], triu_ind[1]):
p = P[k, i]
for l in range(k-1, -1, -1): # convert partial correlation to raw correlation
p = p * np.sqrt((1 - P[l, i]**2) *
(1 - P[l, k]**2)) + P[l, i] * P[l, k]
P[k, i] = p
P[i, k] = p
return np.transpose(P, (2, 0 ,1))
def is_pos_def(A):
if np.array_equal(A, A.T):
try:
np.linalg.cholesky(A)
return 1
except np.linalg.linalg.LinAlgError:
return 0
else:
return 0
P = lkj_random(4, 1., 1000)
k=0
for i, p in enumerate(P):
k+=is_pos_def(p)
print(k)Thoughts? @aseyboldt |
|
@junpenglao I haven't look at it in detail, but should there really be that many 1. in the partial correlations |
|
@aseyboldt at the end only the diagonal is 1. Currently LKJ random returns the upper triangular elements (same as the distribution) but I was just doing some test in the code above. |
|
Sorry, you are right. |
|
@aseyboldt not sure I understand what you mean.
[EDIT], trying to figure out whether I have the same output as Julia or Stan. |
|
hm. But shouldn't |
|
I'm using this to compute the partial correlations: (And just added a print statement in the function to get the original values) |
|
I will also compare with the implementation in R from @rmcelreath https://github.com/rmcelreath/rethinking/blob/master/R/distributions.r#L165-L184, I cannot really figure it out in Stan and Julia |
|
Sorry, my last comment was wrong, I looked at the wrong array. I just deleted it. |
generated Corr matrix is now positive definite.
|
Turns out the R implementation from @rmcelreath is the most stable - test pass locally and samples are mostly positive definite (failed sometimes when n is large and eta<<1, but much better than the previous implementation nonetheless). |
|
@aseyboldt The current random method could potentially be extended for generating LKJCholskyCov as it produces a triangular matrix as an intermedia step. |
using the algorithm in LKJ 2009(vine method based on a C-vine)