Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dissimilarity functions #609

Merged
merged 66 commits into from
Aug 13, 2024
Merged

New dissimilarity functions #609

merged 66 commits into from
Aug 13, 2024

Conversation

thpralas
Copy link
Collaborator

This PR adds new methods getDissimilarity and addDissimilarity.
These methods combine overlap, JSD, and UniFrac dissimilarity calculations as implemented in mia, and additionally use vegan dissimilarity function when overlap, JSD, or UniFrac are not employed.

This PR also makes the following naming modifications:
calculateOverlap --> getOverlap
runOverlap --> addOverlap
calculateUnifrac --> getUnifrac
calculateJSD --> getJSD

NAMESPACE Outdated Show resolved Hide resolved
NAMESPACE Outdated Show resolved Hide resolved
R/calculateJSD.R Outdated Show resolved Hide resolved
R/calculateJSD.R Outdated Show resolved Hide resolved
R/calculateOverlap.R Outdated Show resolved Hide resolved
@TuomasBorman
Copy link
Contributor

@antagomir

The idea was to add dissimilarity calculation method back (we had it couple years ago).

  1. Dissimilarity calculation is common step and we could have wrapper for "all" the dissimilarity methods
    • We removed the previous wrapper because it just applied vegan::vegdist to assay
  2. We have our "own" dissimilarity measures implemented in mia (JSD, overlap, unifrac) in any case
  3. This makes applying dissimilarity calculation little bit easier and straightforward.

The idea is this:

data("GlobalPatterns")
tse <- GlobalPatterns

# Add to reducedDim
tse <- addDissimilarity(tse, method = "euclidean")
tse <- addDissimilarity(tse, method = "unifrac")
# Get only the dissimilarity matrix
res <- getDissimilarity(tse, method = "bray", diss.fun = vegan::vegdist)
# It works also for matrices
res <- getDissimilarity(assay(tse), method = "bray")
# We can use runMDS with getDissimilarity (makes things little bit easier when we can always use getDissimilarity)
tse <- runMDS(tse, method = "unifrac", tree = rowTree(tse), FUN = getDissimilarity)
tse <- runMDS(tse, method = "bray", FUN = getDissimilarity)

@antagomir
Copy link
Member

Yes, seems good.

In principle it would be great to be able to run methods like runMDS() for TreeSE objects without specifying the tree separately but that's a separate discussion.

The reducedDim slot works for this but it is also somewhat confusing because it is also used for ordination results are derived from dissimilarities.

@TuomasBorman
Copy link
Contributor

TuomasBorman commented Jul 22, 2024

  1. We could create addMDS wrapper for that. The code below just shows the idea.

data("GlobalPatterns")
tse <- GlobalPatterns

tse <- runMDS(
    tse,
    FUN = mia::calculateUnifrac,
    name = "Unifrac",
    tree = rowTree(tse),
    assay.type = "counts")

addMDS <- function(tse, tree = rowTree(tse), ...){
    runMDS(tse, tree = tree, ...)
}

tse <- addMDS(
    tse,
    FUN = mia::calculateUnifrac,
    name = "Unifrac",
    assay.type = "counts")
  1. We could add it also to metadata(), but we should check what is the consensus. (however, as that is easy to change, we can check that thing after the rest of the function is ready). Adding to metadata() would solve the issue about dissimilarities of features as they cannot be stored in reducedDim().

@TuomasBorman
Copy link
Contributor

Does the new dissimilarity function support rarified values? If not, let us open a new issue on that one and solve later.

No. I am wondering if we should add rarefaction also for overlap, jsd and unifrac (for vegan dissimilarities there is already vegan::avgdist)

IMO it would be helpful if getDissimilarity could support vegan::avgdist (for rarefaction purposes).

In addition, we could have rarefaction supported also for overlap, jsd and unifrac but I do not consider this urgent. The unifrac seems to me the most common one out of these so I would start with that (after adding support for vegan::avgdist).

Now rarefaction is supported for all methods through vegan::avgdist

@TuomasBorman
Copy link
Contributor

TuomasBorman commented Aug 9, 2024

@thpralas can you check that everything works. Can you do some checks for rarefaction?

I will still fix documentation. assay_name was additional to getDissimilarity but removing it messed up other doc pages...

E: the documentation should be fine now

@TuomasBorman
Copy link
Contributor

vegan is in imports. However, there are multiple .require_package("vegan"). They all can be removed

@thpralas
Copy link
Collaborator Author

I tried adding checks on rarefaction but I get different results using the wrapper and vegan::avgdist. I guess it is caused by the random sampling in rarefaction but even when setting seed to obtain the same result, it does not work as intended.

@antagomir
Copy link
Member

I tried adding checks on rarefaction but I get different results using the wrapper and vegan::avgdist. I guess it is caused by the random sampling in rarefaction but even when setting seed to obtain the same result, it does not work as intended.

Setting the seed probably doesn't help because these functions have different command sequences anyway, then the results from even same random seed do not necessarily match exactly.

More essential is that the results are sufficiently close. Could you paste here example comparison code and resulting comparison plot? There is related active discussion in #561 - we have similar issue with alpha diversity.

If we can conclude that both methods work correctly (there can be some technical choices that are acceptable and explain the differences), and if they yield comparable results then it would be very helpful to just get this merged soon, and open a separate issue that should investigate why results from vegan::avgdist, vegan::rrarefy, and mia wrappers are different, and what should be done about that.

Merge branch 'devel' of https://github.com/microbiome/mia into getDissimilarity

# Conflicts:
#	R/calculateUnifrac.R
@thpralas
Copy link
Collaborator Author

Here is the check and its ouput:

data(GlobalPatterns, package="mia")
tse <- GlobalPatterns
mat <- assay(tse, "counts")
ntop <- 5
tse_sub <- tse[head(rev(order(rowSds(assay(tse, "counts")))), ntop), ]
mat <- assay(tse_sub, "counts")
clr <- function (x) {
    vegan::decostand(x, method="clr", pseudocount=1)
}
set.seed(123)
res1 <- vegan::avgdist(t(mat), distfun = vegdist, dmethod = "euclidean",
                                     sample = min(colSums2(mat)), iterations = 10, 
                                     transf = clr)
set.seed(123)
res2 <- getDissimilarity(tse_sub, method = "euclidean", niter = 10, 
                                      transf = clr)
expect_equal(as.matrix(res1), as.matrix(res2))

Error: as.matrix(res1) not equal to as.matrix(res2).
650/676 mismatches (average diff: 10.3)
[2] 1.43 - 5.01 == -3.58
[3] 2.68 - 7.61 == -4.93
[4] 3.84 - 14.92 == -11.09
[5] 3.81 - 14.84 == -11.03
[6] 7.17 - 24.08 == -16.91
[7] 3.82 - 12.16 == -8.33
[8] 5.43 - 17.06 == -11.62
[9] 8.03 - 26.99 == -18.97
[10] 6.16 - 19.03 == -12.87

The results of getDissimilarity and vegan::avgdist are of the same order of magnitude but differ significantly.

@TuomasBorman
Copy link
Contributor

TuomasBorman commented Aug 13, 2024

There was a problem: dist() function was used incorrectly. Intention was to convert "normal" matrix to distance matrix. dist() was used instead of as.dist(). I removed conversion completely. Without rarefaction, getDissimilarity outputs distance matrix; with rarefaction it outputs normal matrix. (Not sure if this is a big deal)

Does it work now?

@thpralas
Copy link
Collaborator Author

There was a problem: dist() function was used incorrectly. Intention was to convert "normal" matrix to distance matrix. dist() was used instead of as.dist(). I removed conversion completely. Without rarefaction, getDissimilarity outputs distance matrix; with rarefaction it outputs normal matrix. (Not sure if this is a big deal)

Does it work now?

Yes it works now, thanks!

@TuomasBorman
Copy link
Contributor

Nice work! I think this is now good to go after checks pass

@TuomasBorman TuomasBorman merged commit 0eb0ebc into devel Aug 13, 2024
3 checks passed
@TuomasBorman TuomasBorman deleted the getDissimilarity branch August 13, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants