Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save bandwidth by avoiding superfluous Nodes Requests to peers already on the Close List #511

Merged
merged 1 commit into from
Apr 29, 2017

Conversation

zugz
Copy link

@zugz zugz commented Mar 12, 2017

Currently, when we receive a response to a getnodes request, for each node in
the response we check if it could be added to the close list, and if so we send
the node a getnodes request; if it replies, we add it to the close list.

We do the same for each friends list, but for them we first check whether the
node is already on the friends list, and don't send a getnodes request if it
is.

This change adds a corresponding check for the close list, such that we only
send a getnodes request if the node can be added to the close list and is not
already on the close list.

The original behaviour could, and as far as I can see should be expected
typically to, lead to getnode loops resulting in lots of unnecessary traffic.
This can happen as follows: suppose A, B, and C are all close to each other in
the DHT, and all know each other, and A sends B a getnodes request searching
for A (as happens at least every 60 seconds). B's response will likely include
C. It's quite likely that the part of A's close list (the "k-bucket") into
which C falls will not be full, i.e. that A does not know 8 nodes at that
distance. Then even though A already knows C, when A receives B's response A
will send a getnodes to C. Then C's response is likely to include B, in which
case (if that bucket is also not full) A will send another getnodes to B. So we
obtain a tight loop, with the only delays being network delays and the delay
between calls to do_DHT().

I don't see that there can be any downside to adding this check, but I'm not
entirely confident that I couldn't be missing something subtle, so please do
review carefully.


This change is Reviewable

@zugz
Copy link
Author

zugz commented Mar 12, 2017

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

@robinlinden
Copy link
Member

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


toxcore/DHT.c, line 1025 at r1 (raw file):

    if (index > LCLIENT_LENGTH) {
        index = LCLIENT_LENGTH - 1;

So LCLIENT_LENGTH is okay, but you set index to LCLIENT_LENGTH - 1? Did you mean to use >= in the if-check or is this all intentional?


Comments from Reviewable

@nurupo
Copy link
Member

nurupo commented Mar 23, 2017

Is the "close list" equivalent of the k-bucket list in Kademlia? If so, the behaviour you are complaining about might actually be the desired behaviour and you fixing it would make DHT less reliable. In Kademlia, each k-bucket contains at most k nodes and is sorted by the most recently seen time. You update the k-bucket list every time you receive any DHT reply from any node, putting that node on the top (or bottom, depends on how you implemented it) of the list, since it's the most recently seen node in that bucket. Of course, if the node is already somewhere in the k-bucket, you remove it before adoing it to the top, as you don't want duplicates in the k-bucket. The property of k-buckets being always sorted by the most recently seen time is key to Kademlia DHT's reliability, because once a k-bucket is full and you add a new node to it, the least recently seen node is removed from the k-bucket, i.e. the bottom most node is removed. This makes it so that k-bucket contains k nodes which are very likely to be online. If you change the code in such a way that you don't put a recently seen node at the top of the k-bucket if it's already in it, you violate the property of the k-bucket being sorted by the most recently seen time, which allows for the exact opposite behaviour to happen -- the bottom most node might in fact be one of the most recently seen nodes, which should be on the top of the k-bucket instead, and since the bottom most node is removed when you try to insert a new node in a full k-bucket, because the algorithm assumes that the bottom most node is the least recently seen node, it would unknowingly remove the most recently node.

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

@zugz
Copy link
Author

zugz commented Mar 23, 2017 via email

@zugz
Copy link
Author

zugz commented Mar 23, 2017 via email

@robinlinden
Copy link
Member

:lgtm_strong:


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@iphydf iphydf modified the milestone: v0.1.8 Mar 26, 2017
@iphydf
Copy link
Member

iphydf commented Mar 26, 2017

Please enable the checkbox "Allow edits from maintainers." on the bottom right.

@zugz zugz force-pushed the checkInCloselist branch 2 times, most recently from e7c2f48 to b8ac109 Compare April 1, 2017 12:30
@zugz zugz changed the title check if already in close list in ping_node_from_getnodes_ok() Save bandwith by avoiding sending superfluous Nodes Requests to peers already on the Close List Apr 1, 2017
@iphydf iphydf changed the title Save bandwith by avoiding sending superfluous Nodes Requests to peers already on the Close List Save bandwidth by avoiding superfluous Nodes Requests to peers already on the Close List Apr 1, 2017
@iphydf
Copy link
Member

iphydf commented Apr 1, 2017

@nurupo are you happy with this PR?

@GrayHatter
Copy link

let me take a look too

@GrayHatter
Copy link

I did some primitive testing, and I'm sad to say that this patch doesn't seem to lead to any immediately discernable reduction in bandwidth. Whether this is because the scenario I sketched isn't actually occuring, or just because it's being drowned out by all the other sources of bandwidth use, I don't know.

@JFreegman has a feature branch with a packet count/size, did you, or can you use that to see if it's changing the packet counts in other ways?


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@GrayHatter
Copy link

That said, I'm only familiar with Kademlia DHT, not Tox DHT, but Tox DHT is supposedly based on Kademlia DHT so this is likely to still apply.

From my understanding that's not how Tox sorts it's close list. Unless I misunderstand something. ONLY replaces nodes that have timed out.


Comments from Reviewable

@GrayHatter
Copy link

:lgtm: afaict, @zugz is correct about everything. All this pull does is prevent the DHT from adding nodes that exist in the close list to the to_bootstrap list. Which IIRC would only then attempt to add them to the close list.

If they're already in the close list, we honestly don't need to interact with them at all*, assuming other nodes closer send us a get nodes request, we don't need to ask them for close nodes anymore*.

*many restrictions apply, and I'm not advocating we actually do this.


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@iphydf
Copy link
Member

iphydf commented Apr 9, 2017

@irungentoo would you like to take a look at this to make sure it makes sense?

@irungentoo
Copy link

This looks fine.

@iphydf iphydf modified the milestones: v0.1.8, v0.1.9 Apr 12, 2017
@zugz
Copy link
Author

zugz commented Apr 28, 2017

I did a little imprecise testing, and I'm observing roughly a quartering of
the number of getnodes requests sent during normal DHT operation once this
patch is applied (from about 8 per second to about 2). So this should indeed
help with our bandwidth problem.

fix index bounds check in add_to_close() and is_pk_in_close_list()

add TODO to write test for bug which fixed by this commit
@robinlinden robinlinden merged commit 2474f43 into TokTok:master Apr 29, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants