-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add calculate_cng_indices #97
base: main
Are you sure you want to change the base?
Add calculate_cng_indices #97
Conversation
factor_analyzer/factor_analyzer.py
Outdated
if model == "factors": | ||
data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data)))) | ||
# TODO: Should this line be here? | ||
data = covariance_to_correlation(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line comes from the eigenComputes
function. When the data passed in are correlations and cor == TRUE
, it calls eigen(cov2cor(corFa(x)))
. However, it doesn't do this if the variables/observations themselves are passed, in which case it calls eigen(corFA(cov(x)))
. This produces very different results! I think this is only needed if covariances are passed in rather than correlations, but since this implementation calculates the correlations I think this line should be removed. Agreed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, are we assuming the input value here will always be eigenForm == "data"
? If so, I think you're right that this should be removed, but I may be misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right. I was getting confused by the case where dataType == correlation && cor == FALSE
, but I think in that case the data passed in should actually be a covariance matrix, not a correlation matrix.
62823a7
to
9143d0e
Compare
|
||
values = np.sort(np.linalg.eigvals(data))[::-1] | ||
|
||
num_variables = len(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't len(data)
give you the number of observations, rather than the number of variables, assuming data
is a numpy
array or pandas
data frame?
Unfortunately, this doesn't translate exactly from R:
> df <- data.frame(A = c(1, 2, 3), B = c(1, 2, 3))
> length(df)
[1] 2
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]})
>>> len(df)
3
>>> len(df.values)
3
Also, should we move the line below to the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this line of the code data
refers to the correlation matrix, which is square. data.shape[0] == data.shape[1]
The eigenvalues and CNG indices of the dataset | ||
""" | ||
data = corr(data) | ||
if model == "factors": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just going to have "factors", do we need the model
argument? Does it make sense to exclude this for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if model == "components"
, the adjustment underneath is not needed. In other words, only taking the correlation of the data is sufficient. We might want to add elif model != "components": raise ValueError
factor_analyzer/factor_analyzer.py
Outdated
if model == "factors": | ||
data -= np.linalg.pinv(np.diag(np.diag(np.linalg.pinv(data)))) | ||
# TODO: Should this line be here? | ||
data = covariance_to_correlation(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, are we assuming the input value here will always be eigenForm == "data"
? If so, I think you're right that this should be removed, but I may be misunderstanding.
Thanks so much for submitting this PR! I had a few comments/questions. Let me know if any of them don't make sense, or if you just want me to submit a PR to merge into this branch. |
I implemented this function while working on a paper and decided it belongs in this excellent project. I will add tests (I have been comparing to output from R), but there is a major potential issue in the code which I will mark out with inline comments.