-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21
Comments
We almost exclusively run Augur on raw counts. The exception is for very acute perturbations (e.g. mice walking on a treadmill for 15 min prior to sample collection) where we found that running estimates of RNA velocity provide more information than raw counts. |
Thanks for your fast answer. I understand the need for RNA velocity estimates in certain cases. We are mostly interested in longer time frames, so raw counts would be our choice. In such case, does Augur include normalization by total counts per cell or other similar normalization optimized for single cell RNAseq data? Intuitively, it would seem to me that classification between 2 conditions should be performing better on normalized data, and that therefore Augur may work better using normalized data. But of course I may be wrong ! Have you investigated this question or do you know relevant papers on this topic ? |
It's important to consider that 'better classification' is not really the goal of Augur - instead we are trying to identify cell types that are showing a transcriptional response to a perturbation, and so what's really of interest are the relative differences in classification accuracy between cell types. In our initial experiments, we saw minimal changes in the relative rankings of individual cell types when normalizing gene expression (e.g. with log-TP10K). However, we did find that there was less separation between cell types when running Augur on normalized gene expression values and so we generally run Augur on untransformed counts. |
Thanks a lot for your answer. When I was speaking of better classification, I actually meant higher accuracy of classification between unperturbed and perturbed cells. So I believe that we are on the same page. Thanks for pointing me to Fig 10 of the extended data, I will have a further look at it. |
@skinnider looking at the code of |
You indicated in your publication that your method is relatively robust to various preprocessing and normalization steps. However I tested it on a single cell RNAseq dataset using counts or normalized log-transformed counts as input data matrix and found quite different cell type prioritization results. What would you generally recommend to use?
The text was updated successfully, but these errors were encountered: