Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to approach grouped data? #33

Open
elmonten opened this issue Nov 21, 2024 · 1 comment
Open

Best way to approach grouped data? #33

elmonten opened this issue Nov 21, 2024 · 1 comment

Comments

@elmonten
Copy link

Hello,

Thank you very much for developing such an interesting, useful, and practical package. I was wondering what you would recommend for grouped samples. I understand from the discussion on your paper that doing different ones per group difficult information sharing between groups. Thus, I was wondering what your suggested approach for grouped data would be. Thank you very much in advance for your time.

Best wishes,
Elena

@krisrs1128
Copy link
Collaborator

Hi Elena, thank you for sharing this question, and I apologize for the delay. I think you are referring to this quote from our discussion:

Similarly, for datasets collected across multiple sites or environments, alignment may provide a compromise between fitting a separate model per site, which fails to pool any shared information, and implementing a full hierarchical model, which can be a labor-intensive exercise.

Our logic was that a topic model might be able to identify both general and site-specific sources of variation -- some topics might be used by all samples and others might only have high memberships at specific sites. But since the site/group information isn't explicitly provided, it might not be as powerful.

I would generally recommend fitting an alignment to pooled data because then all samples can be understood with respect to the same topics. Afterwards, you could look for topics that have high weights within specific groups vs. those that are shared across multiple groups. @lasy has a group illustration of this in Section 4.4 of the supplemental material of this paper, where they use Dirichlet Regression to see how topic memberships vary between pregnant vs. non-pregnant participants.

I think that Laura has some experience comparing alignments that are made separately across groups. I remember that in some cases the tree structure changed (e.g., more branching in one group relative to the other). But this seems to be more difficult to interepret, because branches that have similar sizes/locations in the alignment might have different community memberships.

I realize that by now you are likely already done with the analysis you had in mind, but please let me know if you have any follow-up questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants