Skip to content

Improve node search over k-buckets (getClosestLocalNode) #212

@joshuakarp

Description

@joshuakarp

Specification

To optimise reading out the closes nodes to a target node we need to apply some improvements.

The getClosestNode function needs to take a nodeId and limit as parameters. The nodeId is the node we're calculating the distance relative to. The limit is how many nodes we wish to return. The limit defaults to the nodeBucketLimit as per the Kademlia spec.

We need to avoid reading out all of the buckets and iterating over empty buckets. This can be achieved by using a readStream over the nodeGraphBucketsDb level. This level contains sub levels for each bucket. Each sub level contains the nodeId:nodeInfo key:value pairs. Using the nodeGraphBucketsDb level we can iterate over each stored node in bucket order all at once. Note when setting the gt or lt on the stream we need to start from the desired bucket. In this case the starting point is the bucket 'above' the desired starting bucket. the key we want to start from takes the form of a Buffer with <prefix><higherBucketId><prefix>. Iterating less than this gives us the target bucket plus all lower buckets. Above this is all of the higher buckets.

When we run out of lower buckets we need to iterate over the higher buckets from where we started. If we run into limit while iterating over the nodes we need to get the whole of the last bucket we read. since nodes are out of order within a bucket we need whole buckets to ensure we obtain the closest nodes.

The resulting list is sorted by distance using nodesUtils.bucketSortByDistance and the list is truncated down to the provided limit.

As implemented

We iterate over the nodes directly across the buckets. the nodes are read out in the following order.

  1. all nodes within the target bucket N
  2. nodes in order of bucket 0 to bucket N-1
  3. nodes in order of bucket N+1 to 255.

When we reach our specified limit we read the whole of the last bucket we've read and add that to the list. we then sort all of the nodes and truncate the list back down to the limit and return that.

Additional context

Tasks

  • 1. Update getClosestNodes implementation to use a readStream to iterate over each node sequentially across all of the buckets.
  • 2. if we run out of 'closer' buckets we iterate over the higher buckets.
  • 3. buckets are not ordered via distance, so if we read any node from a bucket into the list we need to get the whole bucket.
  • 4. the returned list needs to be sorted by distance and truncated down to the limit.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions