-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential inconsistency between R and Python Versions #12
Comments
Hey Michael! Thanks a lot for catching this. I am traveling at the moment, but will have a careful look in a week or so and get back to you. Vipul |
Hi Michael, You are right in that there is an inconsistency between the Python and R versions. The R version behaves as desired, and looks like the Python version needs a bugfix. We will get on that soon, and update you once the Python version is fixed. In the meantime, if you would like to use the Python version, I think perhaps you can try just using the neighbourhood mean without the AGF (i.e., M = 0) in your analysis. If you do want to use the AGF, I would suggest using the R version for now (or generating the BANKSY matrix using the R version, and importing it into Python via a csv file). Just for concreteness / our own reference, here is what the two versions are doing currently for the computation of the M = 1 (AGF) component of the BANKSY matrix, with centering turned on. The symbols are defined precisely in the Methods section of the manuscript. The R version (correct): In the Python version, as you point out, the-mean-to-center-by is computed using both the normalized gaussian envelope term where we have used the properties of the complex exponential (unit magnitude), We then compute the which, while probably fine in practice (since the centering term is still a mean), does mean that the two versions don't match exactly. Thanks a lot for catching this again! Vipul |
Thank you soooo much for answering my questions! I have tried replacing weighted average with uniform average (by just using I have just one more question: what kind of clustering method do you suggest for general ST data, such as 10X Visium or SlideSeq? I noticed that you have implemented a version of Leiden clustering on SNN, which is consistent with the clustering results in Seurat (very clear and well-structured code, by the way!). The clustering results look fine, but as the dataset scale increases (~50k samples), this clustering procedure takes too much time. I tried moving on to KMeans, but the results are not acceptable under any Much appreciated! |
Cool! Glad with the corrected centering works! On your other question: I think maybe you can try two things (described below). I have not tried them, but they are on my list. If you try them, and want to add a vignette / tutorial to the banksy repo(s), feel free to fork and then do a PR! Scaling Idea 1: Then just run BANKSY as usual (so compute neighbours in this 100 feature space). In particular, you will still reduce your (100PC + 100 nbd mean feature + 100 AGF feature) Scaling Idea 2 (I think you might need to use the R version for this) |
Hi,
Very fascinating and exciting work! :)
This issue is mainly about the implementation of AGF across different versions of BANKSY. In the R version, AGF (also called 1-Harmonics) is computed as follows:
As I understand it,
(weight * exp(j * M * phi))
is the AGF coefficient andfscale(gcm[, to, drop = FALSE])
is the scaled neighborhood expression. However, in the Python version, such neighborhood expression is computed as follows:I think that
zerod
andfscale(gcm[...
should be consistent, such thatnbr_avgs
should be some mean without weights. May I ask whether such behavior is desirable?Appreciate your efforts!
The text was updated successfully, but these errors were encountered: