Best way to approach grouped data? #33

elmonten · 2024-11-21T11:57:21Z

Hello,

Thank you very much for developing such an interesting, useful, and practical package. I was wondering what you would recommend for grouped samples. I understand from the discussion on your paper that doing different ones per group difficult information sharing between groups. Thus, I was wondering what your suggested approach for grouped data would be. Thank you very much in advance for your time.

Best wishes,
Elena

krisrs1128 · 2024-12-24T02:10:51Z

Hi Elena, thank you for sharing this question, and I apologize for the delay. I think you are referring to this quote from our discussion:

Similarly, for datasets collected across multiple sites or environments, alignment may provide a compromise between fitting a separate model per site, which fails to pool any shared information, and implementing a full hierarchical model, which can be a labor-intensive exercise.

Our logic was that a topic model might be able to identify both general and site-specific sources of variation -- some topics might be used by all samples and others might only have high memberships at specific sites. But since the site/group information isn't explicitly provided, it might not be as powerful.

I would generally recommend fitting an alignment to pooled data because then all samples can be understood with respect to the same topics. Afterwards, you could look for topics that have high weights within specific groups vs. those that are shared across multiple groups. @lasy has a group illustration of this in Section 4.4 of the supplemental material of this paper, where they use Dirichlet Regression to see how topic memberships vary between pregnant vs. non-pregnant participants.

I think that Laura has some experience comparing alignments that are made separately across groups. I remember that in some cases the tree structure changed (e.g., more branching in one group relative to the other). But this seems to be more difficult to interepret, because branches that have similar sizes/locations in the alignment might have different community memberships.

I realize that by now you are likely already done with the analysis you had in mind, but please let me know if you have any follow-up questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to approach grouped data? #33

Best way to approach grouped data? #33

elmonten commented Nov 21, 2024

krisrs1128 commented Dec 24, 2024

Best way to approach grouped data? #33

Best way to approach grouped data? #33

Comments

elmonten commented Nov 21, 2024

krisrs1128 commented Dec 24, 2024