LDA ordination with feature loadings computation #598

thpralas · 2024-07-05T10:52:00Z

This PR adds two new methods getLDA and addLDA that perform Latent Dirichlet Allocation.

getLDA returns the ordination matrix with the feature loadings matrix stored as attribute.
addLDA adds this ordination matrix to reducedDims of the TreeSummarizedExperiment.

This PR is related to issue #596.

antagomir · 2024-07-05T13:06:57Z

Great! I will be first looking fwd to comments from @TuomasBorman and then let's see if I have anything to add.

antagomir · 2024-07-05T13:07:04Z

Conflicts to resolve?

TuomasBorman

Looks good. Even though the code is rather short, structure it by adding tests and comments

R/runLDA.R

…ome/mia into feature_loadings_methods

TuomasBorman · 2024-07-08T13:39:35Z

Couple things to discuss:

LDA has features from both ordination and clustering methods, however, it is more clustering method. That should be taken into account in documentation.
As this is clustering method, should we add clusters to rowData/colData? In theory, LDA can be applied also to rows, which makes reducedDim not suitable for storing the result. (Instead of ordination methods, addCluster() should be used as an example)
And finally, should this be implemented in bluster package (the Bioconductor clustering package)? We could make a PR as we did last year https://github.com/LTLA/bluster/blob/master/R/DmmParam.R

antagomir · 2024-07-08T21:02:34Z

Intuitively, reducedDim is a logical place given that this would be usually done for samples. But it could be in principle applied on both rows and columns. The function generates both feature matrix and feature loading matrix, both should be stored.

LDA is a matrix decomposition method, and I would not recommend to link it with clustering so directly.

But it could in principle go to the bluster package. Not sure if they would approve but it could be asked by opening an issue. That might take more time, however.. therefore I would meanwhile proceed with implementing these in mia, and later we can move them elsewhere if necessary? LDA and NMF have clear uses, it would be useful to have them easily accessible.

TuomasBorman · 2024-07-08T21:06:31Z

With these explanations, I am fine with mia implementation and reducedDim()

…ome/mia into feature_loadings_methods

codecov · 2024-07-09T11:25:15Z

Codecov Report

Attention: Patch coverage is 66.66667% with 16 lines in your changes missing coverage. Please review.

Please upload report for BASE (devel@dea9d92). Learn more about missing BASE report.

Files	Patch %	Lines
R/utils.R	31.57%	13 Missing ⚠️
R/addLDA.R	89.65%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             devel     #598   +/-   ##
========================================
  Coverage         ?   68.08%           
========================================
  Files            ?       44           
  Lines            ?     5273           
  Branches         ?        0           
========================================
  Hits             ?     3590           
  Misses           ?     1683           
  Partials         ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TuomasBorman

Minor things

R/addLDA.R

R/utils.R

tests/testthat/test-5addLDA.R

R/addLDA.R

antagomir · 2024-07-11T09:12:10Z

Ok to me to add that.

However, if it feels a lot to add now, I would focus on finalizing the present one (with a single fixed k) first, and then adding that multi-k optimization as a separate PR.

TuomasBorman · 2024-07-11T09:31:05Z

That would require something like this

addLDA <- function(eval.metric = "perplexity"){
    if(eval.metric == "coherence" && !require("topicdoc") ){
        stop("'topicmodel' package must be installed in order to calculate coherence.")
    }
    models <- lapply(k, function(x){
        LDA(k = x)
    }
    
    metric <- .calc_metrics(models)
    
    # Get the model with lowest eval.metric (dfault value perplexity).
    
    # Get scores
    # Add loadings, model, and metrics data.frame to attributes

}

.calc_metrics <- function(x){
    # Loop through models
    # Calculate perplexity with topicmodels
    # Calculate coherence (I believe you have to use topicdoc package). Calculate it if that package is available
    # Create df from both metrics
}

Merge branch 'devel' of https://github.com/microbiome/mia into feature_loadings_methods # Conflicts: # NEWS

R/utils.R

R/addLDA.R

TuomasBorman · 2024-07-16T08:56:51Z

NMF would go with this same template I think. However, there k (rank) can be numeric vector and the function finds optimal k by itself

TuomasBorman · 2024-07-16T09:23:15Z

For LDA we could have another PR for multiple k values, if you are not able to implement it now

The idea was this:

data("GlobalPatterns")
tse <- GlobalPatterns
tse <- agglomerateByPrevalence(tse, rank="Phylum")

eval.metric <- "perplexity"
k <- c(2, 3, 4, 5, 6, 7, 8, 9, 10)
k <- 5
assay.type <- "counts"

mat <- assay(tse, assay.type)

df <- as.data.frame(t(mat))




if( eval.metric == "coherence" ){
    .require_package("topicdoc")
}

models <- lapply(k, function(i){
    topicmodels::LDA(df, i)
})

metrics <- .calculate_lda_metrics(models, k)

lda_model <- models[[ which.max(metrics[[eval.metric]]) ]]

posteriors <- topicmodels::posterior(lda_model, df)
scores <- t(as.data.frame(posteriors$topics))
attr(scores, "loadings") <- loadings
attr(scores, "model") <- lda_model
attr(scores, "eval_metric") <- metrics 

.calculate_lda_metrics <- function(models, k){
    metrics <- lapply(models, function(model){
        res <- topicmodels::perplexity(model)
        names(res) <- "perplexity"
        if( require("topicdoc") ){
            coherence <- topicdoc::topic_coherence(model, df)
            names(coherence) <- paste0("coherence_", seq_len(length(coherence)))
            coherence <- c(coherence, coherence_mean = mean(coherence))
            res <- c(res, coherence)
        }
        return(res)
    })
    metrics <- as.data.frame( dplyr::bind_rows(metrics) )
    metrics[["k"]] <- k
    return(metrics)
}

#################################
df <- attr(scores, "eval_metric")

library(ggplot2)

ggplot(df) +
    geom_point(aes(x = k, y = perplexity), color = "red")
ggplot(df) +
    geom_point(aes(x = k, y = mean_coherence), color = "blue")

TuomasBorman

Is this PR ready? If the modification on multiple k values do not come to this PR, I think this is good to go

R/utils.R

…ome/mia into feature_loadings_methods

thpralas added 3 commits July 5, 2024 12:28

new functions getLDA and addLDA

8d71878

up

6bd1766

add topicmodels package to imports

19c061b

thpralas requested review from TuomasBorman and antagomir July 5, 2024 11:54

Merge branch 'devel' into feature_loadings_methods

fb518cf

TuomasBorman requested changes Jul 8, 2024

View reviewed changes

thpralas added 5 commits July 8, 2024 13:15

coding style and small modifications

fcc1af3

Merge branch 'feature_loadings_methods' of https://github.com/microbi…

0eb7e3e

…ome/mia into feature_loadings_methods

new internal function add_values_to_reducedDims

1846179

move topicmodels package to suggests

3e85be5

simplify function

5b9cac7

thpralas and others added 4 commits July 9, 2024 12:19

add comments to getLDA and addLDA

e2b6633

Merge branch 'devel' into feature_loadings_methods

869e828

add input checks

c75ff16

Merge branch 'feature_loadings_methods' of https://github.com/microbi…

55af874

…ome/mia into feature_loadings_methods

add tests for addLDA

b87516d

thpralas requested a review from TuomasBorman July 9, 2024 11:51

TuomasBorman requested changes Jul 10, 2024

View reviewed changes

R/addLDA.R Outdated Show resolved Hide resolved

R/addLDA.R Outdated Show resolved Hide resolved

R/addLDA.R Outdated Show resolved Hide resolved

R/addLDA.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

review modifications

cad6b79

TuomasBorman reviewed Jul 10, 2024

View reviewed changes

tests/testthat/test-5addLDA.R Show resolved Hide resolved

TuomasBorman reviewed Jul 10, 2024

View reviewed changes

R/addLDA.R Outdated Show resolved Hide resolved

TuomasBorman reviewed Jul 10, 2024

View reviewed changes

R/addLDA.R Outdated Show resolved Hide resolved

TuomasBorman reviewed Jul 10, 2024

View reviewed changes

R/addLDA.R Outdated Show resolved Hide resolved

thpralas added 3 commits July 11, 2024 12:31

review modifs

a76aeae

call agglomerateByPrevalence without prevalence level in examples

29694b7

merge devel

205cbfb

Merge branch 'devel' of https://github.com/microbiome/mia into feature_loadings_methods # Conflicts: # NEWS

thpralas requested a review from TuomasBorman July 16, 2024 08:25

Merge branch 'devel' into feature_loadings_methods

d5c2184

TuomasBorman requested changes Jul 16, 2024

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

R/addLDA.R Outdated Show resolved Hide resolved

R/addLDA.R Outdated Show resolved Hide resolved

thpralas and others added 3 commits July 17, 2024 13:58

address review comments

b2cfb2f

Update addLDA.R

97a215e

Merge branch 'devel' into feature_loadings_methods

12ab7d1

TuomasBorman requested changes Jul 23, 2024

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

thpralas and others added 3 commits July 23, 2024 11:56

modify input check and indentation in .add_values_to_reducedDims

e1abecb

Merge branch 'feature_loadings_methods' of https://github.com/microbi…

f19f0c6

…ome/mia into feature_loadings_methods

up

305b8a3

TuomasBorman approved these changes Jul 23, 2024

View reviewed changes

TuomasBorman and others added 10 commits July 23, 2024 20:28

up

22547a9

up

ac1887a

up

e8f497e

up

0c32880

up

36cce5a

up

dd096ba

up

511f405

Merge branch 'devel' into feature_loadings_methods

37b4d9b

up

6a60d87

up

88cb50f

TuomasBorman merged commit 50bd2ea into devel Jul 23, 2024
3 checks passed

TuomasBorman deleted the feature_loadings_methods branch July 23, 2024 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDA ordination with feature loadings computation #598

LDA ordination with feature loadings computation #598

thpralas commented Jul 5, 2024

antagomir commented Jul 5, 2024

antagomir commented Jul 5, 2024

TuomasBorman left a comment

TuomasBorman commented Jul 8, 2024 •

edited

Loading

antagomir commented Jul 8, 2024

TuomasBorman commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 9, 2024 •

edited

Loading

TuomasBorman left a comment

antagomir commented Jul 11, 2024

TuomasBorman commented Jul 11, 2024 •

edited

Loading

TuomasBorman commented Jul 16, 2024

TuomasBorman commented Jul 16, 2024

TuomasBorman left a comment

LDA ordination with feature loadings computation #598

LDA ordination with feature loadings computation #598

Conversation

thpralas commented Jul 5, 2024

antagomir commented Jul 5, 2024

antagomir commented Jul 5, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

TuomasBorman commented Jul 8, 2024 • edited Loading

antagomir commented Jul 8, 2024

TuomasBorman commented Jul 8, 2024 • edited Loading

codecov bot commented Jul 9, 2024 • edited Loading

Codecov Report

TuomasBorman left a comment

Choose a reason for hiding this comment

antagomir commented Jul 11, 2024

TuomasBorman commented Jul 11, 2024 • edited Loading

TuomasBorman commented Jul 16, 2024

TuomasBorman commented Jul 16, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

TuomasBorman commented Jul 8, 2024 •

edited

Loading

TuomasBorman commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 9, 2024 •

edited

Loading

TuomasBorman commented Jul 11, 2024 •

edited

Loading