-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket hang up (ECONNRESET) - Web3js #27859
Comments
Bump on this as Solend often sees this error |
Bump also, Hubble and Kamino have oracle staleness issues due to this. |
Love it. I'll dig into this, this week. |
K, here's what I think I've learned from this excellent article on tuning keep-alive.
I believe the solutions to be as follows:
Let's discuss on over at #29130. |
@steveluscher Switchboard is having better performance with this version of web3.js Thanks for getting this fixed. Will report back if anything changes. |
Rad. What exactly does better look like in your case @gallynaut? |
We monitor event loop health for our oracles. With the ECONNRESET issue the oracles would be blocked from 1s to 2min which caused some feeds to be stale. With this patch we no longer see the event loop blocked warnings. |
Yaaaas. This is great news. @gallynaut, can you check out this discussion from another team that's having some success with this patch? I'm curious to know how your setup is structured, and what the keep-alive timeouts are configured to at every step in the network (the client is now 19s, your load balancer is ???, and presumably your RPC endpoint is the Solana official RPC which is set to 20s). |
Problem
We at Triton One have seen many developers facing an error like this when using the web3js:
I've been collecting some tcp dump from our servers and I can see in the vast majority of times this is caused due to an RST packet sent by our HAproxy Loadbalancer which abruptly closes the connection in the client side, see this screenshot as an example.
IP: 204.16.246.170 (Load Balancer managed by Triton)
IP: 18.237.101.162 (Client)
This seems to be a common issue across many nodejs applications when I search through stackoverflow.com. While the client can simply ignore it and re-connect, this raises complains from our customers that are expecting to extract maximum read performance from our servers.
Proposed Solution
Here's a few proposed solutions:
Remove HTTP keep-alive functionality completely from Web3js so it closes sockets as soon as the client gets a response.
Enforce client-side timeouts in the http keep-alive settings, e.g: https://github.com/solana-labs/solana-web3.js/blob/master/src/agent-manager.ts#L13 could be set as
{keepAlive: true, maxSockets: 25, timeout: 30000};
(30s) or shorter?Note: 2) won't likely solve the issue completely but potentially reduce the error rate as the errors don't happen in a fixed interval, it varies on every application. Some customers see it every X minutes while others see it pretty much every few seconds. So here i'm proposing that the client closes the socket before the LB does to avoid the client abruptly closing the connection
The text was updated successfully, but these errors were encountered: