Skip to content

Conversation

@AgeManning
Copy link
Member

Issue Addressed

It was reported that when Prysm connects to us, there is occasionally a small time in negotiating protocols before we set-up gossipsub that we have an idle connection. In this case Lighthouse immediately drops the connection.

This results in us preventing Prysm from connecting to us some fraction of the time. This PR increases the timeout from 0 seconds to 1 second to help correct this.

@AgeManning AgeManning added Networking v6.0.0 New major release for hierarchical state diffs labels Nov 24, 2024
@AgeManning
Copy link
Member Author

@jxs - Is probably not going to be awake early enough to approve this one, but I think it's not going to destroy the world if for some reason this turns out to be a mistake. We can revert after the RC

@michaelsproul
Copy link
Member

I'll just roll it into the RC branch without merging it

michaelsproul added a commit that referenced this pull request Nov 25, 2024
Squashed commit of the following:

commit 2a8dae5
Author: Age Manning <[email protected]>
Date:   Mon Nov 25 10:31:52 2024 +1100

    Increase idle connection timeout
@michaelsproul michaelsproul mentioned this pull request Nov 25, 2024
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a conversation with @dennis-tra some weeks ago, the data seemed inconclusive as AFAIR we were connected to Prysm clients, is there new data that backs this decision?

btw I am not sure if with_idle_connection_timeout() affects negotiation, confirm here

@AgeManning
Copy link
Member Author

Yeah. The discussion I had was just that through experimentation they identified that this is the cause for connection drops. Changing this value is expected to resolve it.

I've not seen specific data on it, but I also didn't see much downside to having a 1 second buffer, just in case it does help.

Curious to know if there is some downside here (we shouldn't have idle connections so long as gossipsub is negotiated).

@AgeManning
Copy link
Member Author

I guess. perhaps we maintain connections with crawlers and other clients that don't support gossipsub more easily with this PR.

Potentially, in a future PR we should only allow a percentage of our peer count to not support gossipsub. Last I checked there was a decent amount, like 10-20% of peers that didn't support it.

michaelsproul added a commit that referenced this pull request Nov 25, 2024
Squashed commit of the following:

commit 2a8dae5
Author: Age Manning <[email protected]>
Date:   Mon Nov 25 10:31:52 2024 +1100

    Increase idle connection timeout
@dennis-tra
Copy link

Hi everyone,

yeah I talked to @AgeManning at DevCon about this issue. I also mentioned that we had a chat @jxs and that the data was inconclusive.

@jxs you mentioned that Lighthouse is indeed connected to Prysm peers but the crucial thing is in which direction these connections point. I presume most connections are pointing outwards from Lighthouse to Prysm peers. However, let me emphasize that this is just a hypothesis. I'm basing this on our observations from interacting with other rust-libp2p peers and a study of the relevant code.

In go-libp2p there's a brief moment in time when connecting to a peer before control is handed back to the user where all four conditions are true from your referenced code snippet. In go-libp2p the relevant part is here (which is called from the user-facing Connect call here) where we wait until the identify exchange has completed before we do anything else (like opening a new stream).

@AgeManning
Copy link
Member Author

@dennis-tra - Cool. Wondering if we can confirm this before we merge this into our next release.

Is it possible for you to use this PR and see if it resolves the issues you have been seeing?

If they persist, then perhaps the problem lies elsewhere

michaelsproul added a commit that referenced this pull request Nov 27, 2024
Squashed commit of the following:

commit 2a8dae5
Author: Age Manning <[email protected]>
Date:   Mon Nov 25 10:31:52 2024 +1100

    Increase idle connection timeout
michaelsproul added a commit that referenced this pull request Nov 28, 2024
Squashed commit of the following:

commit 2a8dae5
Author: Age Manning <[email protected]>
Date:   Mon Nov 25 10:31:52 2024 +1100

    Increase idle connection timeout
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dennis-tra! Thanks for your explanation, meanwhile I also found libp2p/rust-libp2p#4967 which I wasn't aware and makes sense to be merged, so I think we can leave it to 10 second here while a rust-libp2p release with 10 seconds as the default isn't released

@dennis-tra
Copy link

Great news, thanks @jxs! 10s sounds good.

@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Dec 1, 2024
@michaelsproul
Copy link
Member

@mergify queue

@mergify
Copy link

mergify bot commented Dec 1, 2024

queue

🛑 The pull request has been removed from the queue default

Details

Pull request #6604 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Dec 1, 2024
@michaelsproul
Copy link
Member

@mergify dequeue

@mergify
Copy link

mergify bot commented Dec 1, 2024

dequeue

✅ The pull request has been removed from the queue default

@michaelsproul
Copy link
Member

@mergify queue

@mergify
Copy link

mergify bot commented Dec 1, 2024

queue

🛑 The pull request has been removed from the queue default

Details

Pull request #6604 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member

@mergify requeue

@mergify
Copy link

mergify bot commented Dec 1, 2024

requeue

✅ This pull request will be re-embarked automatically

Details

The followup queue command will be automatically executed to re-embark the pull request

@mergify
Copy link

mergify bot commented Dec 1, 2024

queue

🛑 The pull request has been removed from the queue default

Details

Pull request #6604 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Dec 1, 2024
@michaelsproul
Copy link
Member

@mergify dequeue

@mergify
Copy link

mergify bot commented Dec 2, 2024

dequeue

✅ The pull request has been removed from the queue default

@michaelsproul
Copy link
Member

@mergify requeue

@mergify
Copy link

mergify bot commented Dec 2, 2024

requeue

✅ This pull request will be re-embarked automatically

Details

The followup queue command will be automatically executed to re-embark the pull request

@mergify
Copy link

mergify bot commented Dec 2, 2024

queue

🛑 The pull request has been removed from the queue default

Details

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Dec 2, 2024
@michaelsproul
Copy link
Member

@mergify requeue

@mergify
Copy link

mergify bot commented Dec 2, 2024

requeue

✅ This pull request will be re-embarked automatically

Details

The followup queue command will be automatically executed to re-embark the pull request

@mergify
Copy link

mergify bot commented Dec 2, 2024

queue

✅ The pull request has been merged automatically

Details

The pull request has been merged automatically at 770d677

mergify bot added a commit that referenced this pull request Dec 2, 2024
@mergify mergify bot merged commit 770d677 into sigp:unstable Dec 2, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Networking ready-for-merge This PR is ready to merge. v6.0.0 New major release for hierarchical state diffs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants