Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell cycle regression and n_embedding tuning #11

Open
BiotechPedro opened this issue Jan 11, 2024 · 1 comment
Open

Cell cycle regression and n_embedding tuning #11

BiotechPedro opened this issue Jan 11, 2024 · 1 comment

Comments

@BiotechPedro
Copy link

BiotechPedro commented Jan 11, 2024

Hello Constantin,

I've been playing a bit with lemur on the following dataset. Basically, a cell line treated under multiple conditions (~10) and I have two replicates of the experiment. However, the cell cycle is a clear confounding factor. How should I regress it? Is the next way the correct one to follow?
fit <- lemur(sce, design = ~ conditions + experiment, n_embedding = 15, test_fraction = 0.25)
fit <- align_harmony(fit, design = ~ Phase)

I've already tried this and the cell cycle effect disappears, but I don't know if this is technically a good procedure. On the other hand, fit <- align_by_grouping(fit, grouping = sce$Phase) does almost nothing. What would be your way to proceed?

Also, would you mind to share some ideas on how to tune the n_embedding? I've read that it follows the same logic as PCs, but maybe you've found a normal range of use (e.g. 15-20) or you put it larger for very heterogeneous data.

Thanks a lot!

Pedro

@BiotechPedro
Copy link
Author

Hi Constantin! Here I am again :)

I've read LEMUR's recent version of the preprint and it is greatly explained. I really like the new applications.

Although I ended up not analysing the dataset I commented about above, I am again dealing with a cell lines dataset. In here, the cell cycle effect is big, so I am wondering how would you proceed. Which code from the above, or another, would you use? Would you subset the data to work on a concrete cell cycle phase? Would you just remove the cell cycle genes from the HVGs and continue as normal?

Regarding the number of latent dimensions, I see in your new preprint version that it varies depending on the dataset. Would the general recommendation be to compute the cumulative variance explained as you do in Suppl. Fig. 7A and then to look for the 'elbow'?

Thank you!

Pedro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant