Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-Cell Differential Gene Expression #906

Open
rcannood opened this issue Sep 29, 2024 · 0 comments
Open

Single-Cell Differential Gene Expression #906

rcannood opened this issue Sep 29, 2024 · 0 comments
Labels
help wanted Extra attention is needed task Add a new task

Comments

@rcannood
Copy link
Member

rcannood commented Sep 29, 2024

Task Motivation

Single-cell RNA sequencing allows for unprecedented resolution in studying cellular heterogeneity and gene expression dynamics. However, analyzing data from multiple experiments is often complicated by batch effects, which can significantly impact the identification of differentially expressed genes (DEGs). Accurate DEG identification in multi-batch scRNA-seq data is crucial for understanding cell type identity, gene regulatory networks, and disease states.

This task, inspired by Nguyen et al. (2023) and Soneson & Robinson (2018), aims to benchmark DEG methods in multi-batch scRNA-seq data. By incorporating diverse datasets and performance metrics, we can provide researchers with a clear understanding of the strengths and weaknesses of various approaches, leading to more reliable and reproducible DEG analysis.

Task Description

Problem: Identify genes differentially expressed between two or more groups of cells in a multi-batch scRNA-seq dataset, while accounting for batch effects.

Input:

  • Count matrix: Genes (rows) x Cells (columns) with expression counts.
  • Cell metadata: Data frame with cell annotations (cell type labels, experimental conditions, batch information).

Output:

  • Ranked DEG list: Genes with associated statistics (p-value, fold change, effect size) indicating the magnitude and significance of differential expression, adjusted for batch effects.

Assumptions:

  • Preprocessed count matrix (quality control, normalization).
  • Accurate cell type annotations.
  • Balanced study design (each batch contains cells from all conditions/groups).

Constraints:

  • Methods should handle high dimensionality, sparsity, and batch effects.
  • Methods and metrics should be computationally efficient for large datasets.

Proposed Datasets

  • Nguyen et al. (2023) datasets:
    • Model-based simulated data (splatter): Controlled environment with varying batch effects, sequencing depths, and zero rates.
    • Model-free simulated data: Real scRNA-seq data with simulated DEGs, incorporating realistic batch effects.
  • dyngen generated datasets: Dynamic gene expression data with complex regulatory networks and batch effects.
  • Soneson & Robinson (2018) datasets (from conquer repository): Consistently processed, analysis-ready public scRNA-seq datasets with abundance estimates for genes and transcripts, including both full-length and UMI protocols.
  • cellxgene census datasets: Pairs of well-annotated cell types across multiple studies, providing real-world data with diverse batch effects.

Initial Methods

DE methods as evaluated in Soneson & Robinson:

  • Bulk RNA-seq methods: edgeR, DESeq2, limma (voom, trend), SAMseq
  • Single-cell specific methods: MAST, SCDE, monocle, scDD, BPSC, DEsingle, D3E

Types of approaches as evaluarted by Nguyen et al.

  • Naïve Methods: Standard DE analysis of pooled uncorrected data.
  • Covariate Models: Parametric DE analysis with a batch covariate (DESeq2, edgeR, limma, MAST).
  • Batch Effect Correction: MNN, scVI, Scanorama.
  • Meta-Analysis: Methods for combining DE results across batches (e.g., Fisher's method, fixed/random effects models).

Control Methods

  • Positive Control: Known marker genes for each cell type.
  • Negative Control: Randomly permuted cell labels (maintaining batch assignments) or a random permutation of genes.

Proposed Metrics

  • Generalized F-score and partial AUPR score (as used in Nguyen et al. 2023).

References

Soneson, C., Robinson, M. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15, 255–261 (2018). https://doi.org/10.1038/nmeth.4612

Nguyen, H.C.T., Baik, B., Yoon, S. et al. Benchmarking integration of single-cell differential expression. Nat Commun 14, 1570 (2023). https://doi.org/10.1038/s41467-023-37126-3

@rcannood rcannood added the task Add a new task label Sep 29, 2024
@rcannood rcannood changed the title Single-Cell Differential Expression Single-Cell Differential Gene Expression Sep 29, 2024
@rcannood rcannood added the help wanted Extra attention is needed label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed task Add a new task
Projects
None yet
Development

No branches or pull requests

1 participant