CPython module for fast calculation of k nearest neighbor (KNN) graphs in high-dimensional vector spaces using Pearson correlation distance and local sensitive hashing (LSH).
The current application is analysis of single cell RNA-Seq data and is the result of a collaboration between Fabio Zanini (now @UNSW) and Paolo Carnevali @ Chan Zuckerberg Initiative, who is the owner of the algorithm code, which is also under MIT license:
https://github.com/chanzuckerberg/ExpressionMatrix2
- A CPU with SSE 4.2 instructions (chances are you have it)
- A C++ 11/14 compiler, e.g. gcc 4.8 or later
- Python 3.4+ or 2.7
- eigen 3
- numpy
- setuptools
- pkg-config
- pkgconfig
- pybind11
(you may need superuser priviledges)
pip install lshknn
For the development version:
git clone https://github.com/iosonofabio/lshknn.git
cd lshknn
python setup.py install
import numpy as np
import lshknn
# Make mock data
# 2 features (rows), 4 samples (columns)
data = np.array(
[[1, 0, 1, 0],
[0, 1, 0, 1]],
dtype=np.float64)
# Instantiate class
c = lshknn.Lshknn(
data=data,
k=1,
threshold=0.2,
m=10,
slice_length=4)
# Call subroutine
knn, similarity, n_neighbors = c()
# Check result
assert (knn == [[2], [3], [0], [1]]).all()