Fix TcpListener to not stop from accepting further connections on transient accept errors#7970
Conversation
TcpListener: Improve handling of Accept errorsTcpListener to not stop from accepting further connections on transient accept errors
Aaronontheweb
left a comment
There was a problem hiding this comment.
Overall looks fine but I left some comments - I'll need to pull up the source and look at the TCP error codes before approving though
| break; | ||
|
|
||
| // Fatal errors - the listener socket itself is broken | ||
| case SocketError.OperationAborted: |
There was a problem hiding this comment.
What's the difference between OperationAborted and ConnectionAborted?
There was a problem hiding this comment.
Reason I'm asking is that this might be a transient error - I'd want to know the systems-level explanation of the code first though.
There was a problem hiding this comment.
I'm really not an expert on low-level socket code, but my understanding is that OperationAborted maps to WSA_OPERATION_ABORTED, which indicates that the listener socket (not the incoming connection socket) was closed/disposed/interrupted.
There was a problem hiding this comment.
Ok, that's great to know - I'll pull down the branch and check it out locally but if what you said is true then yes, this is correct.
| break; | ||
|
|
||
| case SocketError.ConnectionReset: | ||
| case SocketError.ConnectionAborted: |
|
@Aaronontheweb Just a quick update: we're still seeing this happen about once a week in our production environments. We did deploy a hotfix that restarts the actor when we see this happen, so we're mostly shielded from the impact (all the connections are dropped temporarily but we have a recovery mechanism for that), but it's quite possible that many Akka-using apps don't have this and could be impacted. |
|
@petrikero I've talked about this with @Arkatufus this morning and we'll do a new v1.5 release with this fix today. |
I greatly appreciate you handling this so quickly (given the holiday season)! |
…ransient accept errors (akkadotnet#7970) * Minimal fix to handle SocketError.ConnectionAborted in TcpListener. * Handle more transient error cases. Handle known fatal errors explicitly. * Join code paths for known and unknown fatal errors. --------- Co-authored-by: Aaron Stannard <aaron@petabridge.com>
…ransient accept errors (#7970) * Minimal fix to handle SocketError.ConnectionAborted in TcpListener. * Handle more transient error cases. Handle known fatal errors explicitly. * Join code paths for known and unknown fatal errors. --------- Co-authored-by: Aaron Stannard <aaron@petabridge.com>
Fixes #7969
Changes
TcpListener.HandleAccept(). This fixes the issue of theTcpListenerstopping accepting any further incoming connections in case of a transient error during accept.TcpListener.HandleAccept().Disclaimer: I'm not an expert in the domain got help from AI to craft this PR so a thorough review would be appreciated.
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):
Additional notes
TcpListeneractor and re-binding on fatal failures, but any active TCP connections would still be terminated if the issue occurs.TcpListenerstops on a fatal error, theTcp.Unbounddoes not seem to be sent. I.e., this code in theTcpEchoService.Serversample does not trigger:Receive<Tcp.Unbound>(_ => _stopCommander?.Tell("Done"));Regarding whether the unknown errors should be treated as fatal or transient:
TcpListenercan end up in an infinite loop of retrying what is really a fatal error. This prevents the surrounding app from seeing the issue and reacting to it.