[Fix] Harden the router's resolver#3540
Conversation
niklaslong
left a comment
There was a problem hiding this comment.
A nice tightening up of peer tracking! Did a first pass and the current changeset looks good 👍
|
@ljedrz Do we need to apply the same changes from |
|
@howardwu not necessarily; my recommendation would be to first introduce these changes, and then perform a new analysis of the logs, looking for protocol violation false positives and potential connection stability issues. These changes will make the picture a lot more clear. |
|
@ljedrz @niklaslong Which logs are you talking about? Were you able to reproduce the issue yourself? |
|
@joske I was analyzing the logs of one of the Canarynet clients before and after these changes. |
|
Could you share those logs? |
I recall very often seeing the errors Lukasz mentioned. I suggest you can just run your own local canary client as he suggests, if you don't pass any peers you should connect to bootstrap nodes who will connect you to others. |
kaimast
left a comment
There was a problem hiding this comment.
Let a few comments. Sorry if they are too nitpicky...
Could you resolve the conflicts with staging, as well? Hopefully we can get this merged in the coming days.
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
…o a peer Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
… peer Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
…connect Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
…ed peers Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
b673d7b to
67b9d81
Compare
|
Rebased (there were no changes to the original commits), and applied the review comments. |
|
The devnet CI job has failed, but I can't reproduce it locally; can't restart it either. Update: it was a deadlock which I avoided locally due to starting the nodes less quickly. Fixed. |
… connected peers" This reverts commit 3a5ceed.
While investigating a potential issue with some trusted peers being periodically dropped, I've noticed a lot of instances of
Unable to resolve the (...) addressin the log extracts from different networks. I believe most of them are triggered unnecessarily, but we need to be sure, and this PR aims to address this.The proposed changes are as follows:
inboundmethod is "fed" from a lower-level queue which doesn't have an awareness of the address resolver, so the entries that fail to resolve there are basically guaranteed to be post-disconnect "stragglers" and may be ignored (instead of triggering potentially many redundant disconnect attempts, which result in further resolver-related warnings)Message::Disconnect, we shouldn't report it as a protocol violation; this is mostly a cleanup of one or two misleading logsUpdate: the PR was rebased due to a conflict, and while these commit hashes have changed, their contents or order haven't.