Skip to content
This repository has been archived by the owner on Aug 21, 2020. It is now read-only.
/ tau Public archive

Distributed approximate nearest neighbors search

License

Notifications You must be signed in to change notification settings

marekgalovic/tau

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ Archived

Please have a look at AnnDB. AnnDB is a distributed approximate nearest neighbors search database that also supports online data modification.

TAU

TAU is a distributed approximate nearest neighbors search library written in GO. Data is organized into datasets that are composed of partitions (individual files). Partitions are distributed across nodes in the cluster and can be replicated for speed and availability.

Index

TAU currently supports these index types:

  • HNSW constructs a Hierarchical Navigable Small World Graph that separates vertex connections on different layers of the graph. The lower the layer in the graph, the larger average degree a vertex has. Implemented according to this paper.
  • BTree constructs a binary search tree using random projections. When building the tree, a random pair of points is sampled and a hyperplane equidistant to these points is computed. Samples are then split based on the sign of the distance to the hyperplane. This process continues until the number of candidate samples is less than leaf size threshold.
  • Voronoi constructs a search tree using K-Means++ algorithm. At each level of the tree, samples are split into k clusters until the number of candidate samples is less than leaf size threshold.

Cluster

Although it's possible to run TAU on a single machine, TAU can leverage multiple machines to speed up search and dataset index building. Each node is treated equally and there is no concept of a "master". To achieve consensus on which nodes should own what partitions, TAU uses rendezvous (highest random weight) hashing and Zookeeper for node discovery and dataset management.

tau

About

Distributed approximate nearest neighbors search

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages