DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.
Install from PyPI
pip install doubletdetection
Install from source
git clone https://github.com/JonathanShor/DoubletDetection.git
cd DoubletDetection
pip3 install .
If you are using pipenv
as your virtual environment, it may struggle installing from the setup.py due to our custom Phenograph requirement.
If so, try the following in the cloned repo:
pipenv run pip3 install .
To run basic doublet classification:
import doubletdetection
clf = doubletdetection.BoostClassifier()
# raw_counts is a cells by genes count matrix
labels = clf.fit(raw_counts).predict()
# higher means more likely to be doublet
scores = clf.doublet_score()
raw_counts
is a scRNA-seq count matrix (cells by genes), and is array-likelabels
is a 1-dimensional numpy ndarray with the value 1 representing a detected doublet, 0 a singlet, andnp.nan
an ambiguous cell.scores
is a 1-dimensional numpy ndarray representing a score for how likely a cell is to be a doublet. The score is used to create the labels.
The classifier works best when
- There are several cell types present in the data
- It is applied individually to each run in an aggregated count matrix
In v2.5
we have added a new experimental clustering method (scanpy
's Louvain clustering) that is much faster than phenograph. We are still validating results from this new clustering. Please see the notebook below for an example of using this new feature.
See our tutorial for an example on 10k PBMCs from 10x Genomics.
Data can be downloaded from the 10x website.
Gayoso, Adam, Shor, Jonathan, Carr, Ambrose J., Sharma, Roshan, Pe'er, Dana (2020, December 18). DoubletDetection (Version v3.0). Zenodo. http://doi.org/10.5281/zenodo.2678041
We also thank the participants of the 1st Human Cell Atlas Jamboree, Chun J. Ye for providing data useful in developing this method, and Itsik Pe'er for providing guidance in early development as part of the Computational genomics class at Columbia University.
This project is licensed under the terms of the MIT license.