Skip to content

C++ implementation of TreeRep algorithm from Sonathalia & Gilbert "Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding"

Notifications You must be signed in to change notification settings

meiji163/treerep

Repository files navigation

Treerep

C++ implementation of Sonathalia & Gilbert "Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding," 1 with a thin python 3 wrapper.

Treerep takes an N-point metric (matrix of pairwise distances) and computes a weighted tree that approximates it. You can use this tree directly, or embed it in hyperbolic space2. If the original metric is hyperbolic (roughly "tree-like") then Treerep produces an embedding with low distortion. Recent research3 4 5 shows hyperbolic embeddings outperform Euclidean embeddings for many types of hierarchical data.

Install

The C++11 source can be used with no external dependencies. The python API requires pybind11, which can be downloaded by cloning the submodule

$ git clone --recursive https://github.com/meiji163/treerep.git

then compile with cmake

$ cd treerep && mkdir build && cd build
$ cmake ..
$ make
$ make install #if you want to install to your python lib

Usage

To save space, the N-point metric is represented as a length N*(N-1)/2 array (see scipy pdist). Input can be a python list or numpy array.

Here is Sarich's "immunological distance" example from Sonthalia & Gilbert's paper.

from trep import *
from trep_utils import load_mat
metric, N = load_mat("../data/immune.mtx")
print(metric)
# array([ 32.,  48.,  51.,  50.,  48.,  98., 148.,  26.,  34.,  29.,  33.,
#        84., 136.,  42.,  44.,  44.,  92., 152.,  44.,  38.,  86., 142.,
#        42.,  89., 142.,  90., 142., 148.])

tree, weights = treerep(metric,N)
print(weights) 
# {(0, 10): 24.0, (1, 10): 8.0, (2, 11): 19.25, (3, 9): 18.25, (4, 12): 21.25, 
#  (5, 8): 17.5, (6, 9): 67.75, (7, 13): 121.9375, (8, 9): 2.25, (8, 13): 1.0625, 
#  (10, 11): 1.75, (11, 12): 2.0, (12, 13): 1.6875}

the tree looks like this (edges not to scale)

sarich tree

Treerep is a randomized algorithm, so you can produce multiple trees and choose the best.

Todo

  • multithread sorting
  • efficient all pairs shortest path for trees
  • distortion/ MAP

References

About

C++ implementation of TreeRep algorithm from Sonathalia & Gilbert "Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published