-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Specification
To optimise reading out the closes nodes to a target node we need to apply some improvements.
The getClosestNode
function needs to take a nodeId
and limit
as parameters. The nodeId is the node we're calculating the distance relative to. The limit is how many nodes we wish to return. The limit defaults to the nodeBucketLimit
as per the Kademlia spec.
We need to avoid reading out all of the buckets and iterating over empty buckets. This can be achieved by using a readStream
over the nodeGraphBucketsDb
level. This level contains sub levels for each bucket. Each sub level contains the nodeId:nodeInfo key:value pairs. Using the nodeGraphBucketsDb
level we can iterate over each stored node in bucket order all at once. Note when setting the gt or lt on the stream we need to start from the desired bucket. In this case the starting point is the bucket 'above' the desired starting bucket. the key we want to start from takes the form of a Buffer
with <prefix><higherBucketId><prefix>
. Iterating less than this gives us the target bucket plus all lower buckets. Above this is all of the higher buckets.
When we run out of lower buckets we need to iterate over the higher buckets from where we started. If we run into limit while iterating over the nodes we need to get the whole of the last bucket we read. since nodes are out of order within a bucket we need whole buckets to ensure we obtain the closest nodes.
The resulting list is sorted by distance using nodesUtils.bucketSortByDistance
and the list is truncated down to the provided limit.
As implemented
We iterate over the nodes directly across the buckets. the nodes are read out in the following order.
- all nodes within the target bucket
N
- nodes in order of bucket 0 to bucket N-1
- nodes in order of bucket N+1 to 255.
When we reach our specified limit we read the whole of the last bucket we've read and add that to the list. we then sort all of the nodes and truncate the list back down to the limit and return that.
Additional context
- discussion thread https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/195#note_612883452 (text above copied from a previous comment of mine)
- Improve node search over k-buckets (getClosestLocalNode) #212 (comment)
Tasks
- 1. Update
getClosestNodes
implementation to use areadStream
to iterate over each node sequentially across all of the buckets. - 2. if we run out of 'closer' buckets we iterate over the
higher
buckets. - 3. buckets are not ordered via distance, so if we read any node from a bucket into the list we need to get the whole bucket.
- 4. the returned list needs to be sorted by distance and truncated down to the limit.