Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #279

Merged
merged 67 commits into from
Apr 25, 2022
Merged
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b8dcae4
annotate metrics master function
mumichae Nov 25, 2021
5d47c05
add parameter descriptions for metrics wrappers
mumichae Dec 1, 2021
41229fc
setup sphinx documentation
mumichae Dec 1, 2021
b40e102
include automatic module documentation
mumichae Dec 1, 2021
f4be724
update docstrings
mumichae Dec 1, 2021
f9b584c
remove myst_parser dependency
mumichae Dec 1, 2021
6d68c6e
include package code in sphinx conf.py
mumichae Dec 1, 2021
7ba0819
add README to docs
mumichae Dec 1, 2021
31747f9
install scib for docs
mumichae Dec 1, 2021
acd9793
use python 3.8 for readthedocs
mumichae Dec 1, 2021
6e638fc
relative install
mumichae Dec 1, 2021
95ef15e
fix path to conf.py
mumichae Dec 1, 2021
13d49c0
include deprecated to dependencies
mumichae Dec 1, 2021
5ac9d32
deprecate to deprecated
mumichae Dec 1, 2021
191a790
add r-base
mumichae Dec 2, 2021
ef3b7fa
ignore build output
mumichae Dec 16, 2021
1df0dfb
update README links
mumichae Mar 10, 2022
9ae7309
integrated README.md directly with myst-parser
mumichae Mar 10, 2022
aaf5213
params -> param
mumichae Mar 10, 2022
4f34d00
updated silhouette score docs
mumichae Mar 10, 2022
f1746ce
solved merge conflict
mumichae Mar 10, 2022
c482e0f
changed index depths
mumichae Mar 10, 2022
72cd0f6
include extra requirements for readthedocs
mumichae Mar 11, 2022
a5df8e2
fixed typo
mumichae Mar 11, 2022
b370db0
switch to automodapi
mumichae Mar 21, 2022
9288b01
remove and ignore autogenerated docs output
mumichae Mar 21, 2022
bb1c5c3
fixed module scope
mumichae Mar 21, 2022
ac33138
use github link for figure instead of relative one
mumichae Mar 25, 2022
cf8550b
add docstring to metrics package
mumichae Mar 25, 2022
ddb956a
fixed main docstring of preprocessing and automodapi directive
mumichae Mar 25, 2022
9049bb0
added docstrings to integration methods
mumichae Mar 25, 2022
1b4ef96
ignore functions in automodapi
mumichae Mar 29, 2022
7e31b1d
updated preprocessing docstrings
mumichae Apr 13, 2022
9708e46
renamed saveSeurat
mumichae Apr 13, 2022
5a4e1c8
rearrange lisi code
mumichae Apr 13, 2022
fb7d2c9
added structure to Package overview
mumichae Apr 13, 2022
958f13b
updated metrics docstrings
mumichae Apr 13, 2022
692235b
installation instructions via PyPI
mumichae Apr 13, 2022
844b84c
camelcase to snakecase
mumichae Apr 13, 2022
46e0cb5
include links to functions in master metrics function
mumichae Apr 19, 2022
c8fcf63
Update metrics wrapper documentation
mumichae Apr 21, 2022
7146e1c
clean up function usage code
mumichae Apr 21, 2022
1195b3a
merge master
mumichae Apr 21, 2022
d7bd12b
updated HVG overlap header
mumichae Apr 21, 2022
0df3995
update publication links
mumichae Apr 21, 2022
4cff787
moved usage documentation from README to docs
mumichae Apr 21, 2022
1a27f9f
include documation link and cleanup
mumichae Apr 21, 2022
9d19b9d
New page per module
mumichae Apr 21, 2022
628e920
include github link for rtd and fix version retrieval in rtd
mumichae Apr 21, 2022
b14ac68
reference metrics master function for kwargs
mumichae Apr 21, 2022
668d9b0
include links and integration method versions
mumichae Apr 21, 2022
4b0b3ff
fix publication link for metrics module
mumichae Apr 22, 2022
f0960af
improved silhouette score documentation
mumichae Apr 22, 2022
eb6798c
include pcr related functions and better documentation
mumichae Apr 22, 2022
a3d3bd9
use "variance contribution" as output for pcr functions
mumichae Apr 22, 2022
70b19c7
improved graph connectivity docstring
mumichae Apr 22, 2022
124c465
fixed formatting
mumichae Apr 22, 2022
0de4220
fix pcr_comparison accessor
mumichae Apr 22, 2022
15941be
improved LISI docstrings
mumichae Apr 22, 2022
6d8cad4
updated kBET docstring
mumichae Apr 22, 2022
a82cad8
move module docstring content to docs/
mumichae Apr 22, 2022
a685b7c
Fix broken metrics links
mumichae Apr 25, 2022
f129b9c
moved metrics overview table from metrics page to README
mumichae Apr 25, 2022
cbf2102
add more description to metrics overview
mumichae Apr 25, 2022
efc2134
reference documentation page for metrics overview
mumichae Apr 25, 2022
61827a9
fix typo
mumichae Apr 25, 2022
b1ba0ed
using short version of function links
mumichae Apr 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
build/
scratch/

**.h5ad
*test_output*

Expand Down
31 changes: 31 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-20.04
tools:
python: "3.8"
apt_packages:
- r-base

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py
fail_on_warning: false

# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
# - pdf

# Optionally declare the Python requirements required to build your docs
python:
install:
- method: pip
path: .
extra_requirements:
- docs
158 changes: 39 additions & 119 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,156 +1,76 @@
# Benchmarking atlas-level data integration in single-cell genomics

This repository contains the code for our benchmarking study for data integration tools.
In [our study](https://www.biorxiv.org/content/10.1101/2020.05.22.111161v1), we benchmark 16
methods ([see here](##Tools)) with 4 combinations of preprocessing steps leading to 68 methods combinations on 85
batches of gene expression and chromatin accessibility data.
This repository contains the code for the `scib` package used in our benchmarking study for data integration tools.
In [our study](https://doi.org/10.1038/s41592-021-01336-8), we benchmark 16 methods (see Tools) with 4 combinations of
preprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility data.

![Workflow](./figure.png)
![Workflow](https://raw.githubusercontent.com/theislab/scib/main/figure.png)

## Resources

+ On our [website](https://theislab.github.io/scib-reproducibility) we visualise the results of the study.

+ The git repository of the [`scib` package](https://github.com/theislab/scib) and its [documentation](https://scib.readthedocs.io/).
+ The reusable pipeline we used in the study can be found in the
separate [scib pipeline](https://github.com/theislab/scib-pipeline.git) repository. It is reproducible and automates
the computation of preprocesssing combinations, integration methods and benchmarking metrics.

+ On our [website](https://theislab.github.io/scib-reproducibility) we visualise the results of the study.
+ For reproducibility and visualisation we have a dedicated
repository: [scib-reproducibility](https://github.com/theislab/scib-reproducibility).

### Please cite:

**Benchmarking atlas-level data integration in single-cell genomics.**
MD Luecken, M Büttner, K Chaichoompu, A Danese, M Interlandi, MF Mueller, DC Strobl, L Zappia, M Dugas, M Colomé-Tatché,
FJ Theis bioRxiv 2020.05.22.111161; doi: https://doi.org/10.1101/2020.05.22.111161_

## Package: `scib`
Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics.
Nat Methods 19, 41–50 (2022). [https://doi.org/10.1038/s41592-021-01336-8](https://doi.org/10.1038/s41592-021-01336-8)

We created the python package called `scib` that uses `scanpy` to streamline the integration of single-cell datasets and
evaluate the results. For evaluating the integration quality it provides a number of metrics.
## Package: scib

### Requirements
We created the python package called `scib` that uses `scanpy` to streamline the integration of single-cell datasets
and evaluate the results.
The package contains several modules for preprocessing an ``anndata`` object, running integration methods and
evaluating the resulting using a number of metrics.
For preprocessing, ``scib.preprocessing`` (or ``scib.pp``) contains functions for normalising, scaling or batch-aware
selection of highly variable genes.
Functions for the integration methods are in ``scib.integration`` or for short ``scib.ig`` and metrics are under
``scib.metrics`` (or ``scib.me``).

+ Linux or UNIX system
+ Python >= 3.7
+ 3.6 <= R <= 4.0

We recommend working with environments such as Conda or virtualenv, so that python and R dependencies are in one place.
Please also check out [scib pipeline](https://github.com/theislab/scib-pipeline.git) for ready-to-use environments.
Alternatively, manually install the package on your system using pip, described in the next section.

### Installation

The `scib` python package is in the folder scib. You can simply install it from the root of this repository using
The `scib` python package is available on [PyPI](https://pypi.org/) and can be installed through

```
pip install .
pip install scib
```

Alternatively, you can also install the package directly from GitHub via

```
pip install git+https://github.com/theislab/scib.git
```

Additionally, in order to run the R package `kBET`, you need to install it through R.

```R
devtools::install_github('theislab/kBET')
```
Import `scib` in python:

> **Note:** By default dependencies for integration methods are not installed due to dependency clashes.
> In order to use integration methods, see the next section

### Installing additional packages

This package contains code for running integration methods as well as for evaluating their output. However, due to
dependency clashes, `scib` is only installed with the packages needed for the metrics. In order to use the integration
wrapper functions, we recommend to work with different environments for different methods, each with their own
installation of `scib`. You can install optional Python dependencies via pip as follows:

```
pip install .[bbknn] # using BBKNN
pip install .[scanorama] # using Scanorama
pip install .[bbknn,scanorama] # Multiple methods in one go
```python
import scib
```

The `setup.cfg` for a full list of Python dependencies. For a comprehensive list of supported integration methods,
including R packages, check out the `Tools`.

## Usage

The package contains several modules for the different steps of the integration and benchmarking pipeline. Functions for
the integration methods are in `scib.integration` or for short `scib.ig`. The methods can be called using

```py
scib.integration.<method>(adata, batch=<batch_key>)
```

where `<method>` is the name of the integration method and `<batch_key>` is the name of the batch column in `adata.obs`.
For example, in order to run Scanorama, on a dataset with batch key 'batch' call

```py
scib.integration.scanorama(adata, batch='batch')
```

> **Warning:** the following notation is deprecated.
> ```
> scib.integration.run<method>(adata, batch=<batch_key>)
> ```
> Please use the snake case naming without the `run` prefix.

Some integration methods (`scgen`, `scanvi`) also use cell type labels as input. For these, you need to additionally provide
the corresponding label column.

```py
scgen(adata, batch=<batch_key>, cell_type=<cell_type>)
scanvi(adata, batch=<batch_key>, labels=<cell_type>)
```

`scib.preprocessing` (or `scib.pp`) contains functions for normalising, scaling or selecting highly variable genes per batch
The metrics are under `scib.metrics` (or `scib.me`).

## Metrics

For a detailed description of the metrics implemented in this package, please see
the [manuscript](https://www.biorxiv.org/content/10.1101/2020.05.22.111161v2).
our [publication](https://doi.org/10.1038/s41592-021-01336-8).

### Batch removal metrics include:

- Principal component regression `pcr_comparison()`
- Batch ASW `silhouette()`
- K-nearest neighbour batch effect `kBET()`
- Graph connectivity `graph_connectivity()`
- Graph iLISI `lisi_graph()`
- Principal component regression `scib.metrics.pcr_comparison()`
- Batch ASW `scib.metrics.silhouette_batch()`
- K-nearest neighbour batch effect `scib.metrics.kBET()`
- Graph connectivity `scib.metrics.graph_connectivity()`
- Graph iLISI `scib.metrics.ilisi_graph()`

### Biological conservation metrics include:

- Normalised mutual information `nmi()`
- Adjusted Rand Index `ari()`
- Cell type ASW `silhouette_batch()`
- Isolated label score F1 `isolated_labels()`
- Isolated label score ASW `isolated_labels()`
- Cell cycle conservation `cell_cycle()`
- Highly variable gene conservation `hvg_overlap()`
- Trajectory conservation `trajectory_conservation()`
- Graph cLISI `lisi_graph()`

### Metrics Wrapper Functions
We provide wrapper functions to run multiple metrics in one function call.
The `scib.metrics.metrics()` function returns a `pandas.Dataframe` of all metrics specified as parameters.

```py
scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)
```

Furthermore, `scib.metrics.metrics()` is wrapped by convenience functions that only select certain metrics:

+ `scib.me.metrics_fast()` only computes metrics that require little preprocessing
+ `scib.me.metrics_slim()` includes all functions of `scib.me.metrics_fast()` and adds clustering-based metrics
+ `scib.me.metrics_all()` includes all metrics
- Normalised mutual information `scib.metrics.nmi()`
- Adjusted Rand Index `scib.metrics.ari()`
- Cell type ASW `scib.metrics.silhouette()`
- Isolated label score F1 `scib.metrics.isolated_labels()`
- Isolated label score ASW `scib.metrics.isolated_labels()`
- Cell cycle conservation `scib.metrics.cell_cycle()`
- Highly variable gene conservation `scib.metrics.hvg_overlap()`
- Trajectory conservation `scib.metrics.trajectory_conservation()`
- Graph cLISI `scib.metrics.clisi_graph()`

## Tools
## Integration Tools

Tools that are compared include:

Expand All @@ -169,4 +89,4 @@ Tools that are compared include:
- [scVI](https://github.com/YosefLab/scVI) 0.6.7
- [Seurat v3](https://github.com/satijalab/seurat) 3.2.0 CCA (default) and RPCA
- [TrVae](https://github.com/theislab/trvae) 0.0.1
- [TrVaep](https://github.com/theislab/trvaep) 0.1.0
- [TrVaep](https://github.com/theislab/trvaep) 0.1.0
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
1 change: 1 addition & 0 deletions docs/source/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
api/
64 changes: 64 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import scib

sys.path.insert(0, os.path.abspath('../..'))

# -- Project information -----------------------------------------------------

project = 'scib'
copyright = '2021, Malte D. Luecken, Maren Buettner, Daniel C. Strobl, Michaela F. Mueller'
author = 'Malte D. Luecken, Maren Buettner, Daniel C. Strobl, Michaela F. Mueller'
github_url = 'https://github.com/theislab/scib'

# The full version, including alpha/beta/rc tags
release = scib.__version__

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.duration',
'sphinx.ext.doctest',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx_automodapi.automodapi',
'sphinx_automodapi.smart_resolver',
'myst_parser'
]
numpydoc_show_class_members = False

# Add any paths that contain templates here, relative to this directory.
# templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']
27 changes: 27 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. scib documentation master file, created by
sphinx-quickstart on Wed Dec 1 14:50:06 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Single-cell integration benchmark scib
======================================

.. toctree::
:maxdepth: 2
:caption: Contents:

installation
scib_preprocessing
scib_integration
scib_metrics

.. include:: ../../README.md
:parser: myst_parser.sphinx_


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Loading