Single-Cell Differential Gene Expression #906

rcannood · 2024-09-29T06:10:43Z

Task Motivation

Single-cell RNA sequencing allows for unprecedented resolution in studying cellular heterogeneity and gene expression dynamics. However, analyzing data from multiple experiments is often complicated by batch effects, which can significantly impact the identification of differentially expressed genes (DEGs). Accurate DEG identification in multi-batch scRNA-seq data is crucial for understanding cell type identity, gene regulatory networks, and disease states.

This task, inspired by Nguyen et al. (2023) and Soneson & Robinson (2018), aims to benchmark DEG methods in multi-batch scRNA-seq data. By incorporating diverse datasets and performance metrics, we can provide researchers with a clear understanding of the strengths and weaknesses of various approaches, leading to more reliable and reproducible DEG analysis.

Task Description

Problem: Identify genes differentially expressed between two or more groups of cells in a multi-batch scRNA-seq dataset, while accounting for batch effects.

Input:

Count matrix: Genes (rows) x Cells (columns) with expression counts.
Cell metadata: Data frame with cell annotations (cell type labels, experimental conditions, batch information).

Output:

Ranked DEG list: Genes with associated statistics (p-value, fold change, effect size) indicating the magnitude and significance of differential expression, adjusted for batch effects.

Assumptions:

Preprocessed count matrix (quality control, normalization).
Accurate cell type annotations.
Balanced study design (each batch contains cells from all conditions/groups).

Constraints:

Methods should handle high dimensionality, sparsity, and batch effects.
Methods and metrics should be computationally efficient for large datasets.

Proposed Datasets

Nguyen et al. (2023) datasets:
- Model-based simulated data (splatter): Controlled environment with varying batch effects, sequencing depths, and zero rates.
- Model-free simulated data: Real scRNA-seq data with simulated DEGs, incorporating realistic batch effects.
dyngen generated datasets: Dynamic gene expression data with complex regulatory networks and batch effects.
Soneson & Robinson (2018) datasets (from conquer repository): Consistently processed, analysis-ready public scRNA-seq datasets with abundance estimates for genes and transcripts, including both full-length and UMI protocols.
cellxgene census datasets: Pairs of well-annotated cell types across multiple studies, providing real-world data with diverse batch effects.

Initial Methods

DE methods as evaluated in Soneson & Robinson:

Bulk RNA-seq methods: edgeR, DESeq2, limma (voom, trend), SAMseq
Single-cell specific methods: MAST, SCDE, monocle, scDD, BPSC, DEsingle, D3E

Types of approaches as evaluarted by Nguyen et al.

Naïve Methods: Standard DE analysis of pooled uncorrected data.
Covariate Models: Parametric DE analysis with a batch covariate (DESeq2, edgeR, limma, MAST).
Batch Effect Correction: MNN, scVI, Scanorama.
Meta-Analysis: Methods for combining DE results across batches (e.g., Fisher's method, fixed/random effects models).

Control Methods

Positive Control: Known marker genes for each cell type.
Negative Control: Randomly permuted cell labels (maintaining batch assignments) or a random permutation of genes.

Proposed Metrics

Generalized F-score and partial AUPR score (as used in Nguyen et al. 2023).

References

Soneson, C., Robinson, M. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15, 255–261 (2018). https://doi.org/10.1038/nmeth.4612

Nguyen, H.C.T., Baik, B., Yoon, S. et al. Benchmarking integration of single-cell differential expression. Nat Commun 14, 1570 (2023). https://doi.org/10.1038/s41467-023-37126-3

The text was updated successfully, but these errors were encountered:

rcannood added the task Add a new task label Sep 29, 2024

rcannood changed the title ~~Single-Cell Differential Expression~~ Single-Cell Differential Gene Expression Sep 29, 2024

rcannood added the help wanted Extra attention is needed label Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-Cell Differential Gene Expression #906

Single-Cell Differential Gene Expression #906

rcannood commented Sep 29, 2024 •

edited

Loading

Single-Cell Differential Gene Expression #906

Single-Cell Differential Gene Expression #906

Comments

rcannood commented Sep 29, 2024 • edited Loading

Task Motivation

Task Description

Proposed Datasets

Initial Methods

Control Methods

Proposed Metrics

References

rcannood commented Sep 29, 2024 •

edited

Loading