-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase IO_TIMEOUT to allow nodes on high-latency connections to sync #3109
Conversation
Commit d3dbafa "Use blocking IO in P2P to reduce CPU load" (merged into v2.1.0) introduced the constant IO_TIMEOUT, setting it to 1 second. On nodes with high-latency connections, this short timeout causes the txhashset archive download during step 2 of the IBD process to invariably fail before it completes. Since there's no mechanism for resuming a failed download, this means the node gets stuck at this stage and never syncs. Increasing IO_TIMEOUT to 10 seconds solves the issue on my node; others might suggest a more optimal value for the constant.
Does this change affect shutdown times? |
Doesn't seem to. Haven't noticed any difference in my node's behavior. |
Yes, it would increase shutdown time. Also we need to figure out why it fails the download, timeout is not an exceptional situation, it doesn't stop reading from a peer. Perhaps we partially read, get timeout and drop the buffer, I'll check it. This would be a short-term fix. |
Agree this is a short-term fix. But it does solve the problem and doesn't seem to cause any serious issues of its own. The downloads were repeatedly failing partway through with EAGAIN. Here's a snippet from the log file:
Suggestion: the fix could be made more fine-grained by using the longer timeout only for the txhashset download. Other connections thus wouldn't be affected. |
Interesting. And I was wrong about longer shutdown, it would be the case with the original PR, but then we changed the behavior, now we just close remaining threads after short period of time |
I've been unable to observe any difference in the patched node's behavior. Aside from the fact that it now syncs :-) |
Commit d3dbafa "Use blocking IO in P2P to reduce CPU load" (merged into v2.1.0) introduced the constant IO_TIMEOUT, setting it to 1 second.
On nodes with high-latency connections, this short timeout causes the txhashset archive download during step 2 of the IBD process to invariably fail before it completes. Since there's no mechanism for resuming a failed download, this means the node gets stuck at this stage and never syncs.
Increasing IO_TIMEOUT to 10 seconds solves the issue on my node; others might suggest a more optimal value for the constant.