You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Single-cell RNA sequencing allows for unprecedented resolution in studying cellular heterogeneity and gene expression dynamics. However, analyzing data from multiple experiments is often complicated by batch effects, which can significantly impact the identification of differentially expressed genes (DEGs). Accurate DEG identification in multi-batch scRNA-seq data is crucial for understanding cell type identity, gene regulatory networks, and disease states.
This task, inspired by Nguyen et al. (2023) and Soneson & Robinson (2018), aims to benchmark DEG methods in multi-batch scRNA-seq data. By incorporating diverse datasets and performance metrics, we can provide researchers with a clear understanding of the strengths and weaknesses of various approaches, leading to more reliable and reproducible DEG analysis.
Task Description
Problem: Identify genes differentially expressed between two or more groups of cells in a multi-batch scRNA-seq dataset, while accounting for batch effects.
Input:
Count matrix: Genes (rows) x Cells (columns) with expression counts.
Cell metadata: Data frame with cell annotations (cell type labels, experimental conditions, batch information).
Output:
Ranked DEG list: Genes with associated statistics (p-value, fold change, effect size) indicating the magnitude and significance of differential expression, adjusted for batch effects.
Balanced study design (each batch contains cells from all conditions/groups).
Constraints:
Methods should handle high dimensionality, sparsity, and batch effects.
Methods and metrics should be computationally efficient for large datasets.
Proposed Datasets
Nguyen et al. (2023) datasets:
Model-based simulated data (splatter): Controlled environment with varying batch effects, sequencing depths, and zero rates.
Model-free simulated data: Real scRNA-seq data with simulated DEGs, incorporating realistic batch effects.
dyngen generated datasets: Dynamic gene expression data with complex regulatory networks and batch effects.
Soneson & Robinson (2018) datasets (from conquer repository): Consistently processed, analysis-ready public scRNA-seq datasets with abundance estimates for genes and transcripts, including both full-length and UMI protocols.
cellxgene census datasets: Pairs of well-annotated cell types across multiple studies, providing real-world data with diverse batch effects.
Task Motivation
Single-cell RNA sequencing allows for unprecedented resolution in studying cellular heterogeneity and gene expression dynamics. However, analyzing data from multiple experiments is often complicated by batch effects, which can significantly impact the identification of differentially expressed genes (DEGs). Accurate DEG identification in multi-batch scRNA-seq data is crucial for understanding cell type identity, gene regulatory networks, and disease states.
This task, inspired by Nguyen et al. (2023) and Soneson & Robinson (2018), aims to benchmark DEG methods in multi-batch scRNA-seq data. By incorporating diverse datasets and performance metrics, we can provide researchers with a clear understanding of the strengths and weaknesses of various approaches, leading to more reliable and reproducible DEG analysis.
Task Description
Problem: Identify genes differentially expressed between two or more groups of cells in a multi-batch scRNA-seq dataset, while accounting for batch effects.
Input:
Output:
Assumptions:
Constraints:
Proposed Datasets
splatter
): Controlled environment with varying batch effects, sequencing depths, and zero rates.dyngen
generated datasets: Dynamic gene expression data with complex regulatory networks and batch effects.cellxgene census
datasets: Pairs of well-annotated cell types across multiple studies, providing real-world data with diverse batch effects.Initial Methods
DE methods as evaluated in Soneson & Robinson:
Types of approaches as evaluarted by Nguyen et al.
Control Methods
Proposed Metrics
References
Soneson, C., Robinson, M. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15, 255–261 (2018). https://doi.org/10.1038/nmeth.4612
Nguyen, H.C.T., Baik, B., Yoon, S. et al. Benchmarking integration of single-cell differential expression. Nat Commun 14, 1570 (2023). https://doi.org/10.1038/s41467-023-37126-3
The text was updated successfully, but these errors were encountered: