What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21

apeleraux · 2022-10-25T16:09:40Z

You indicated in your publication that your method is relatively robust to various preprocessing and normalization steps. However I tested it on a single cell RNAseq dataset using counts or normalized log-transformed counts as input data matrix and found quite different cell type prioritization results. What would you generally recommend to use?

skinnider · 2022-10-25T21:00:21Z

We almost exclusively run Augur on raw counts. The exception is for very acute perturbations (e.g. mice walking on a treadmill for 15 min prior to sample collection) where we found that running estimates of RNA velocity provide more information than raw counts.

apeleraux · 2022-11-04T10:20:36Z

Thanks for your fast answer. I understand the need for RNA velocity estimates in certain cases. We are mostly interested in longer time frames, so raw counts would be our choice. In such case, does Augur include normalization by total counts per cell or other similar normalization optimized for single cell RNAseq data? Intuitively, it would seem to me that classification between 2 conditions should be performing better on normalized data, and that therefore Augur may work better using normalized data. But of course I may be wrong ! Have you investigated this question or do you know relevant papers on this topic ?

skinnider · 2022-11-15T16:33:54Z

It's important to consider that 'better classification' is not really the goal of Augur - instead we are trying to identify cell types that are showing a transcriptional response to a perturbation, and so what's really of interest are the relative differences in classification accuracy between cell types. In our initial experiments, we saw minimal changes in the relative rankings of individual cell types when normalizing gene expression (e.g. with log-TP10K). However, we did find that there was less separation between cell types when running Augur on normalized gene expression values and so we generally run Augur on untransformed counts.
In terms of understanding why Augur is so robust to running on untransformed counts, Extended Data Fig. 10 in the Nat. Biotechnol. paper might be useful in thinking about the kinds of scenarios that would be required for sequencing depth to be a confounding factor in the analysis.

apeleraux · 2022-12-09T14:41:25Z

Thanks a lot for your answer. When I was speaking of better classification, I actually meant higher accuracy of classification between unperturbed and perturbed cells. So I believe that we are on the same page. Thanks for pointing me to Fig 10 of the extended data, I will have a further look at it.

kaizen89 · 2023-07-31T10:19:01Z

@skinnider looking at the code of augur when using seurat object it seems that the default slot used is data which corresponds to normalized data and not the raw counts as you recommend, might be worth changing the default behaviour?

mihem mentioned this issue Mar 30, 2023

Interoperability with Seurat and comparison with traditional (clustered) DE analysis MarioniLab/miloDE#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21

What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21

apeleraux commented Oct 25, 2022

skinnider commented Oct 25, 2022

apeleraux commented Nov 4, 2022

skinnider commented Nov 15, 2022

apeleraux commented Dec 9, 2022

kaizen89 commented Jul 31, 2023

What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21

What do you recommend for single cell RNAseq data: counts, normalized counts in log scale, other? #21

Comments

apeleraux commented Oct 25, 2022

skinnider commented Oct 25, 2022

apeleraux commented Nov 4, 2022

skinnider commented Nov 15, 2022

apeleraux commented Dec 9, 2022

kaizen89 commented Jul 31, 2023