-
Notifications
You must be signed in to change notification settings - Fork 955
Increase idle connection timeout #6604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jxs - Is probably not going to be awake early enough to approve this one, but I think it's not going to destroy the world if for some reason this turns out to be a mistake. We can revert after the RC |
|
I'll just roll it into the RC branch without merging it |
Squashed commit of the following: commit 2a8dae5 Author: Age Manning <[email protected]> Date: Mon Nov 25 10:31:52 2024 +1100 Increase idle connection timeout
jxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a conversation with @dennis-tra some weeks ago, the data seemed inconclusive as AFAIR we were connected to Prysm clients, is there new data that backs this decision?
btw I am not sure if with_idle_connection_timeout() affects negotiation, confirm here
|
Yeah. The discussion I had was just that through experimentation they identified that this is the cause for connection drops. Changing this value is expected to resolve it. I've not seen specific data on it, but I also didn't see much downside to having a 1 second buffer, just in case it does help. Curious to know if there is some downside here (we shouldn't have idle connections so long as gossipsub is negotiated). |
|
I guess. perhaps we maintain connections with crawlers and other clients that don't support gossipsub more easily with this PR. Potentially, in a future PR we should only allow a percentage of our peer count to not support gossipsub. Last I checked there was a decent amount, like 10-20% of peers that didn't support it. |
Squashed commit of the following: commit 2a8dae5 Author: Age Manning <[email protected]> Date: Mon Nov 25 10:31:52 2024 +1100 Increase idle connection timeout
|
Hi everyone, yeah I talked to @AgeManning at DevCon about this issue. I also mentioned that we had a chat @jxs and that the data was inconclusive. @jxs you mentioned that Lighthouse is indeed connected to Prysm peers but the crucial thing is in which direction these connections point. I presume most connections are pointing outwards from Lighthouse to Prysm peers. However, let me emphasize that this is just a hypothesis. I'm basing this on our observations from interacting with other rust-libp2p peers and a study of the relevant code. In go-libp2p there's a brief moment in time when connecting to a peer before control is handed back to the user where all four conditions are true from your referenced code snippet. In go-libp2p the relevant part is here (which is called from the user-facing Connect call here) where we wait until the |
|
@dennis-tra - Cool. Wondering if we can confirm this before we merge this into our next release. Is it possible for you to use this PR and see if it resolves the issues you have been seeing? If they persist, then perhaps the problem lies elsewhere |
Squashed commit of the following: commit 2a8dae5 Author: Age Manning <[email protected]> Date: Mon Nov 25 10:31:52 2024 +1100 Increase idle connection timeout
Squashed commit of the following: commit 2a8dae5 Author: Age Manning <[email protected]> Date: Mon Nov 25 10:31:52 2024 +1100 Increase idle connection timeout
jxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dennis-tra! Thanks for your explanation, meanwhile I also found libp2p/rust-libp2p#4967 which I wasn't aware and makes sense to be merged, so I think we can leave it to 10 second here while a rust-libp2p release with 10 seconds as the default isn't released
Co-authored-by: João Oliveira <[email protected]>
|
Great news, thanks @jxs! 10s sounds good. |
|
@mergify queue |
🛑 The pull request has been removed from the queue
|
|
@mergify dequeue |
✅ The pull request has been removed from the queue
|
|
@mergify queue |
🛑 The pull request has been removed from the queue
|
|
@mergify requeue |
✅ This pull request will be re-embarked automaticallyDetailsThe followup |
🛑 The pull request has been removed from the queue
|
|
@mergify dequeue |
✅ The pull request has been removed from the queue
|
|
@mergify requeue |
✅ This pull request will be re-embarked automaticallyDetailsThe followup |
🛑 The pull request has been removed from the queue
|
|
@mergify requeue |
✅ This pull request will be re-embarked automaticallyDetailsThe followup |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at 770d677 |
Issue Addressed
It was reported that when Prysm connects to us, there is occasionally a small time in negotiating protocols before we set-up gossipsub that we have an idle connection. In this case Lighthouse immediately drops the connection.
This results in us preventing Prysm from connecting to us some fraction of the time. This PR increases the timeout from 0 seconds to 1 second to help correct this.