Skip to content

Commit 378c9f2

Browse files
authored
Merge pull request #139 from jgilis/pbDD
polishing pbDD vignette
2 parents d7725a2 + f66f62e commit 378c9f2

File tree

6 files changed

+75
-5
lines changed

6 files changed

+75
-5
lines changed

R/stagewiseDD.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
}
2121

2222
#' @rdname stagewise_DS_DD
23-
#' @title Perform two-stage testing on DS and DD
23+
#' @title Perform two-stage testing on DS and DD analysis results
2424
#'
2525
#' @param res_DS a list of DS testing results as returned
2626
#' by \code{\link{pbDS}} or \code{\link{mmDS}}.

inst/extdata/refs.bib

+13
Original file line numberDiff line numberDiff line change
@@ -267,3 +267,16 @@ @ARTICLE{Qiu2020
267267
year = 2020,
268268
language = "en"
269269
}
270+
271+
@ARTICLE{Vandenberge2017,
272+
title = "stageR: a general stage-wise method for controlling the gene-level
273+
false discovery rate in differential expression and differential
274+
transcript usage",
275+
author = "Van den Berge, Koen and Soneson, Charlotte and Robinson, Mark D
276+
and Clement, Lieven",
277+
journal = "Genome Biology",
278+
volume = 18,
279+
number = 151,
280+
year = 2017,
281+
language = "en"
282+
}

man/stagewise_DS_DD.Rd

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/analysis.Rmd

+1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ abstract: >
4242

4343
```{r echo = FALSE, message = FALSE, warning = FALSE}
4444
library(BiocStyle)
45+
library(patchwork)
4546
library(cowplot)
4647
```
4748

vignettes/detection.Rmd

+57-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ vignette: >
2828
%\VignetteEncoding{UTF-8}
2929
bibliography: "`r file.path(system.file('extdata', package='muscat'), 'refs.bib')`"
3030
abstract: >
31-
<p> he found himself transformed in his bed into a gigantic insect...
31+
<p> In this vignette, we display how `muscat` can be used to perform differrential detection (DD) analyses in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data. Furthermore, we show how DD and differential state (DS) analysis results on the same data can be effectively combined. This vignette thus introduces a workflow that allows users to jointly assess two biological hypotheses that often contain orthogonal information, which thus can be expected to improve their understanding of complex biological phenomena, at no extra cost.
3232
---
3333

3434
<style type="text/css">
@@ -69,9 +69,15 @@ library(patchwork)
6969

7070
# Introduction
7171

72+
Single-cell RNA-sequencing (scRNA-seq) has improved our understanding of complex biological processes by elucidating cell-level heterogeneity in gene expression. One of the key tasks in the downstream analysis of scRNA-seq data is studying differential gene expression (DE). Most DE analysis methods aim to identify genes for which the *average* expression differs between biological groups of interest, e.g., between cell types or between diseased and healthy cells. As such, most methods allow for assessing only one aspect of the gene expression distribution: the mean. However, in scRNA-seq data, differences in other characteristics between count distributions can commonly be observed.
73+
74+
One such characteristic is gene detection, i.e., the number of cells in which a gene is (detectably) expressed. Analogous to a DE analysis, a differential detection (DD) analysis aims to identify genes for which the *average fraction of cells in which the gene is detected* changes between groups. In @Gilis2023, we show how DD analysis contain information that is biologically relevant, and that is largely orthogonal to the information obtained from DE analysis on the same data.
75+
76+
In this vignette, we display how `muscat` can be used to perform DD analyses in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data. Furthermore, we show how DD and DS analysis results on the same data can be effectively combined using a two-stage testing approach. This workflow thus allows users to jointly assess two biological hypotheses containing orthogonal information, which thus can be expected to improve their understanding of complex biological phenomena, at no extra cost.
77+
7278
# Setup
7379

74-
We will use the same data as in the differential state (DS) analyses described in an independent vignette, namely, scRNA-seq data acquired on PBMCs from 8 patients before and after IFN-$\beta$ treatment.
80+
We will use the same data as in the differential state (DS) analyses described in `r Biocpkg("muscat", vignette = "analysis.html")`, namely, scRNA-seq data acquired on PBMCs from 8 patients before and after IFN-$\beta$ treatment. For a more detailed description of these data and subsequent preprocessing, we refer to `r Biocpkg("muscat", vignette = "analysis.html")`.
7581

7682
```{r load-data, message=FALSE}
7783
library(ExperimentHub)
@@ -95,6 +101,14 @@ table(sce$sample_id)
95101

96102
## Aggregation
97103

104+
In general, `aggregateData()` will aggregate the data by the `colData` variables specified with argument `by`, and return a `SingleCellExperiment` containing pseudobulk data.
105+
106+
To perform a pseudobulk-level analysis, measurements must be aggregated at the cluster-sample level (default `by = c("cluster_id", "sample_id"`). In this case, the returned `SingleCellExperiment` will contain one assay per cluster, where rows = genes and columns = samples. Arguments `assay` and `fun` specify the input data and summary statistic, respectively, to use for aggregation.
107+
108+
In a differential detection (DD) analysis, the default choice of the summary statistic used for aggregation is `fun = "num.detected"`. This strategy can be thought of as first binarizing the gene expression values (1: expressed, 0: not expressed), and subsequently performing a simple summation of the binarized gene expression counts for cells belonging to the same cluster-sample level. Hence, the resulting pseudobulk-level expression count reflects the total number of cells in a particular cluster-sample level with a non-zero gene expression value.
109+
110+
In a differential state (DS) analysis, the default choice for aggregation is `fun = "sum"`, which amounts to the simple summation of the raw gene expression counts of cells belonging to the same cluster-sample level.
111+
98112
```{r pbs-det}
99113
pb_sum <- aggregateData(sce,
100114
assay="counts", fun="sum",
@@ -124,8 +138,49 @@ Once we have assembled the pseudobulk data, we can test for DD using `pbDD()`. B
124138
res_DD <- pbDD(pb_det, min_cells=0, filter="none", verbose=FALSE)
125139
```
126140

141+
## Handling and visualizing results
142+
143+
Inspection, manipulation, and visualization of DD analysis results follows the same principles as for a DS analysis. For a detailed description, we refer to the DS analysis vignette`r Biocpkg("muscat", vignette = "analysis.html")`. Below, some basic functionalities are being displayed.
144+
145+
```{r}
146+
tbl <- res_DD$table[[1]]
147+
# one data.frame per cluster
148+
names(tbl)
149+
```
150+
151+
```{r}
152+
# view results for 1st cluster
153+
k1 <- tbl[[1]]
154+
head(format(k1[, -ncol(k1)], digits = 2))
155+
```
156+
157+
```{r}
158+
# filter FDR < 5%, abs(logFC) > 1 & sort by adj. p-value
159+
tbl_fil <- lapply(tbl, function(u) {
160+
u <- dplyr::filter(u, p_adj.loc < 0.05, abs(logFC) > 1)
161+
dplyr::arrange(u, p_adj.loc)
162+
})
163+
164+
# nb. of DS genes & % of total by cluster
165+
n_de <- vapply(tbl_fil, nrow, numeric(1))
166+
p_de <- format(n_de / nrow(sce) * 100, digits = 3)
167+
data.frame("#DD" = n_de, "%DD" = p_de, check.names = FALSE)
168+
```
169+
170+
```{r}
171+
library(UpSetR)
172+
de_gs_by_k <- map(tbl_fil, "gene")
173+
upset(fromList(de_gs_by_k))
174+
```
175+
127176
# Stagewise anaysis
128177

178+
While DD analysis results may contain biologically relevant information in their own right, we show in @Gilis2023 that combing DD and DS analysis results on the same data can further improve our understanding of complex biological phenomena. In the remainder of this vignette, we show how DD and DS analysis results on the same data can be effectively combined.
179+
180+
For this, we build on the two-stage testing paradigm proposed by @Vandenberge2017. In the first stage of this testing procedure, we identify differential genes by using an omnibus test for differential detection and differential expression (DE). The null hypothesis for this test is that the gene is neither differentially detected, nor differentially expressed.
181+
182+
In the second stage, we perform post-hoc tests on the differential genes from stage one to unravel whether they are DD, DE or both. Compared to the individual DD and DS analysis results, the two-stage approach increases statistical power and provides better type 1 error control.
183+
129184
```{r pbDS}
130185
res_DS <- pbDS(pb_sum, min_cells=0, filter="none", verbose=FALSE)
131186
```

vignettes/simulation.Rmd

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ library(purrr)
5959
library(scater)
6060
library(reshape2)
6161
library(patchwork)
62+
library(cowplot)
6263
library(SingleCellExperiment)
6364
```
6465

0 commit comments

Comments
 (0)