vignettes/how-to-bench.Rmd

---
title: "Benchmarking with decoupleR Tutorial"
author:
  - name: "Daniel Dimitrov"
    affiliation:
    - Saezlab
    email: daniel.dimitrov@uni-heidelberg.de
output: 
  BiocStyle::html_document:
    self_contained: true
    toc: true
    toc_float: true
    toc_depth: 3
    code_folding: show
package: "`r pkg_ver('decoupleRBench')`"
vignette: >
  %\VignetteIndexEntry{Benchmarking with decoupleR Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r chunk_setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```


# Prerequsites
This benchmark setting builds on `decoupleR`, and more specifically the
`decouple` wrapper function. As such, it requires the decoupleR package to be 
installed and it is recommended that the user is familiar with the [basics of decoupleR](https://saezlab.github.io/decoupleR/articles/decoupleR.html#basics-1).

The benchmark pipeline requires an input tibble with user-specified settings,
benchmark data in the form of a count table and a corresponding metadata table.

```{r load_decoupler, include = TRUE}
# load dependencies
library(decoupleRBench)
library(ggplot2)
library(pheatmap)
library(dplyr)
library(tibble)

# We load the files directly from drive urls

# example expression data
bexample_url <- 'https://docs.google.com/uc?export=download&id=1S9YcfQuG2cKLhUJKQjCtxtqdAvFbm9B9'

# benchmark metadata
bmeta_url <- "https://docs.google.com/uc?export=download&id=1nkvdtR0M1CvQPvFfz5OzJ75eicgE-0wu"

# dorothea gene sets
source_url <- "https://docs.google.com/uc?export=download&id=1CRrEcLxUqIp6qKyTIpF7Wuv1dZsiqe0j"
```

## Input tibble
The input tibble serves as the description for each benchmark run and each of
its rows corresponds to a separate call of the `decouple` wrapper. This enables
the user to specify any number of 'benchmark experiments' or combinations of 
benchmark data, networks (or any other type of gene sets), and statistical 
methods that they wish to explore.

The ease of use and flexibility of this approach is demonstrated here - e.g. 
to run the pipeline with a network resource filtered for a given condition only 
requires the filtering criteria to be changed. This also applies to the
statistical methods and their settings, different set sources/networks,
and benchmark data sets.

Note that here we use only an example of the DoRoThEA benchmarking set
[(Holland et al., 2020)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1949-z).

```{r input_example}
design_row <-
    tibble(
        set_name = "dorothea", # name of the set resource
        bench_name = "dbd", # name of the benchmark data
        stats_list = list( # a list of the stats to call
            c(
                "wmean",
                "mlm",
                "gsva",
                "ora"
            )
        ),
        opts_list = list(list( # list of options for each stat method
            wmean = list(times = 100, .mor='mor', .likelihood='likelihood'),
            mlm = list(.mor='mor', .likelihood='likelihood'),
            gsva = list(verbose = FALSE, method = "gsva"),
            ora = list(n_up = 300, n_bottom = 300, n_background = 20000)
        )),
        bexpr_loc = bexample_url, # benchmark data location
        bmeta_loc = bmeta_url, # metadata location
        source_loc = source_url, # set source location
        source_col = "tf", # source name of the gene set source
        target_col = "target", # target name of the set source
        filter_col = "confidence", # column by which we wish to filter
        filter_crit = list(c("A")), # criteria by which we wish to filter
    )

# input tibble where each run/row filters according to different
# confidence level combinations from dorothea.
input_tibble <- bind_rows(
    design_row,
    design_row %>%
        mutate(filter_crit = list(c("A", "B", "C")))
)
input_tibble <- input_tibble[2,]

input_tibble %>%
    rmarkdown::paged_table()
```
**Attention** should be paid to the arguments (options) passed to each 
statistical method. In particular, if the set resource contains columns such
as weights (=`.likelihood`) or mode of regulation (=`.mor`) then the names of 
these columns should also be passed as arguments. 
In the case when these columns are absent, or not provided, arbitrary values
will be assigned.

Please refer to the `run_*` functions for more information about
statistic-specific arguments.


## Benchmark data and metadata
The benchmark data should be in the form of a tibble with count data 
(e.g. gene expression table) with the ID of each perturbation experiment as
column names and gene IDs as rows.
```{r bexpr_example}
decoupleRBench::readRDS_helper(bexample_url, TRUE) %>%
    # show only first 10 genes of first 3 experiments
    slice_head(n = 10) %>%
    select(1:3)
```

The benchmark metadata should also be in the form of a tibble containing the 
'meta' information with rows corresponding to each perturbation experiment
or condition, with columns containing information about perturbation target,
cell line, etc.
```{r bmeta_example}
readRDS_helper(bmeta_url, TRUE) %>%
    slice_head(n = 10) %>%
    # select to show only most relevant columns
    select(id, target, platform, sign)
```
Note that the Experiment ID is used as the key to join the benchmark data and
metadata.

## Set Source
Here, we define a 'Set Source' as any resource containing information about sets
of sources and targets, where a `source` (e.g.  Transcription factors (TFs) or 
Kinases in regulatory networks or GO terms in Gene Ontologies) is linked to its
`targets` (e.g. the Genes or Phosphosites  composing the set of targets for a
given source). 
It is worth pointing out that similarly to `.mor` and `.likelihood`, 
`source_col`, `target_col`, and `filter_col` can take any user-specified name.
The `filter_col` and hence source_set pre-filtering can be omitted by passing
`NA` as its name.

In this tutorial, we will use [DoRothEA](https://github.com/saezlab/dorothea) -
a gene set resource containing signed TF-target interactions.
```{r source_example}
readRDS_helper(source_url, TRUE) %>%
    slice_head(n = 10) # show only first 10 rows
```

# Benchmark run
Now that we have checked the prerequisites and we are confident that they are
formatted appropriately, we can proceed with running the decoupleR benchmark
pipeline.

## Benchmark Settings and Assumptions
Here we run the benchmark pipeline with DoRoThEA using the default settings.

Note that the benchmark pipeline currently evaluates precision and that here
we define the 'Condition positive' as the perturbed targets for each 
condition/experiment.
Then we make the assumption that 'True positives (TPs)' would get a 'score'
from each statistical method that would rank them within the top `n` scores,
where `n` = the number of experiments covered by the set resource. On the other 
hand, a 'True Negative (TN)' is defined as a TF not perturbed and not ranked within
the top `n` of TFs. 'Condition Negatives' are defined in two ways depending 
whether downsampling is performed. 
By default, Condition Negatives are defined as the set of any TF regulon not 
perturbed in the current experiment. As such, each of these non-perturbed
TF sets is appended and considered a Condition negative. This results in
Condition Negatives (equal to the number of non-perturbed TFs for all of the 
experiments) vastly outnumbering the Condition positives (equal to the number of
truly perturbed target in each experiment).
When downsampling is performed, the number of Condition Negatives is set to be 
the same as the number of Condition Positives (i.e. number of experiments covered
by the gene set).

In some cases, assuming that the perturbed target (in this case a TF) in a given
experiment would be the most deregulated one (i.e. the one with highest `score`)
is a flawed assumption due to the nature of regulation in biology. For example,
a TF correlated or downstream of the perturbed target might be more perturbed
than the experiment target itself. However, at this stage, we believe that this
is likely our best choice. Nonetheless, we are open to suggestions and implementations
of alternative source set performance methods.

```{r bench_run, echo=TRUE, results='hide', message=FALSE, warning=FALSE}
dor_run <- run_benchmark(
    .design = input_tibble, # provide input tibble
    .minsize = 5, # filter gene sets with size < 10
    .form = TRUE, # format the benchmark results
    .perform = TRUE, # evaluate benchmarking performance
    .silent = FALSE, # silently run the pipeline
    .downsample_pr = TRUE, # downsample TNs for precision-recall curve
    .downsample_roc = TRUE, # downsample TNs for ROC
    .downsample_times = 100, # downsampling iterations
    .url_bool = TRUE # whether to load from url
)
```

## Pipeline Output
The output of the pipeline is a pre-defined object with benchmark results,
results summary and plots, and the input tibble.

### Benchmarking results
The benchmarking results provide meta information about the run: set_name (name
of the set source), bench_name (name of the benchmark dataset), filter_crit 
(filter criteria), and statistic (statistical method).
`Activity` contains tibbles returned from each call of the decouple wrapper.
`roc` contains Receiver Operator Curve calculations for each decouple run.
`prc` contains Precision-Recall Curve calculations for each decouple run.
```{r bench_results}
dor_run@bench_res %>%
    rmarkdown::paged_table()
```

### Benchmarking Summary
First, we explore the `Summary table`.
It is a rather lengthy, yet highly informative table, containing information 
such as: meta information for each run, AUROC (auc), Precision-Recall Curve AUC 
(pr_auc), source_cov (number of TFs), condition_cov (number of experiments), 
number of condition negatives for the roc (roc_neg) and pr curve (pr_neg), 
statistic_time (Computational time for the statistical method), regulon_time 
(combined time for each of the statistics for each network/set_source variation).

```{r bench_summary}
dor_run@summary$summary_table %>%
    rmarkdown::paged_table()
```

### Benchmark Plots
Now we look at the different plots:

*ROC plot*
```{r bench_roc}
dor_run@summary$roc_plot
```


*Precision-Recall Curve*
```{r bench_prc}
dor_run@summary$pr_plot
```
Note that the straight lines represent random baselines for each network variant
and are proportional to the number condition positives (i.e. perturbed/predicted
experiments) against the total number of instances (positives+negatives). 


*AUROC heatmap*
```{r bench_auroc}
dor_run@summary$auroc_heat
```


*PR AUC heatmap*
```{r bench_prauc}
dor_run@summary$pr_heat
```

Please note that these plots are ggplot2 objects and are thus customisable.


# Contributing to decoupleR
We are committed to the further development of decoupleR and we hope
that with the help of the community, it would become the go-to place for
benchmarking any combination of set sources and statistical methods. As such,
we invite the community to share or implement any ideas, statistical methods,
network or gene resources, and benchmark data sets.

Are you interested in adding a new statistical method or in extending the 
benchmark environment of decoupleR? Please check out our
[contribution guide](https://saezlab.github.io/decoupleR/CONTRIBUTING.html)
and do not hesitate to contact me!


# Session Info
```{r sessionInfo, echo=FALSE}
options(width = 120)
sessioninfo::session_info()
```