fix: report Tcp.CommandFailed when a scheduled connect retry throws (#8195)#8215
Merged
Aaronontheweb merged 1 commit intoMay 17, 2026
Conversation
…kkadotnet#8195) TcpOutgoingConnection scheduled its connect retry as a raw Action on the HashedWheelTimer scheduler thread. When Socket.ConnectAsync threw inside that callback (PlatformNotSupportedException on Linux when reusing a socket after a failed connection attempt), the exception was logged and swallowed by the scheduler. The commander never received Tcp.Connected or Tcp.CommandFailed and stayed stuck permanently. The retry now runs inside the actor's message loop: it is scheduled as a RetryConnect self-message via IWithTimers, and the ConnectAsync call is wrapped in ReportConnectFailure, so any exception is surfaced to the commander as Tcp.CommandFailed and the connection actor stops. Using a timer also cancels the pending retry automatically when the actor stops. Forward-port of akkadotnet#8214 (merged to v1.5).
2bbc3c4 to
2612865
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Forward-port of #8214 (merged to
v1.5) todev. The#8132Akka.IO transport rewrite changed the transport layer but left the outgoing-connection state machine intact, so the bug from #8195 is present ondevas well.On Linux, a dropped TCP connection could leave the commander/user actor permanently stuck — it never received
Tcp.ConnectedorTcp.CommandFailed, and the only recovery was a process restart.Root cause
When a connect attempt fails,
TcpOutgoingConnection.Connectingschedules a retry. That retry was scheduled as a rawActionviaContext.System.Scheduler.Advanced.ScheduleOnce(...), so it ran on the HashedWheelTimer scheduler thread — outside the actor's message loop — and calledSocket.ConnectAsyncdirectly.When that call threw (
PlatformNotSupportedExceptionon Linux when reusing a socket after a failed connect attempt), the exception propagated intoHashedWheelTimerScheduler.Bucket.Execute, which logs and swallows it. Because the exception never re-entered the actor, the existingReportConnectFailure→Stoppath never ran, soTcp.CommandFailedwas never delivered.Fix
The retry now runs inside the actor's message loop (identical approach to #8214):
TcpOutgoingConnectionimplementsIWithTimers(consistent withTcpListenerin the same module).RetryConnectself-message viaTimers.StartSingleTimer.Receive<RetryConnect>handler performs theSocket.ConnectAsynccall wrapped in the existingReportConnectFailure, so any exception is surfaced to the commander asTcp.CommandFailedand the connection actor stops cleanly.This also removes a latent bug: the old raw action could run
Socket.ConnectAsyncon an already-disposed socket if the actor stopped before the scheduled callback fired. WithIWithTimers, the pending timer is canceled automatically when the actor stops.The
devchange is slightly smaller than #8214 becausedevno longer has the DNS IPv4/IPv6 fallback path (Connectinghas a single retry case).Testing
Added
Should_report_CommandFailed_when_outgoing_connection_is_refusedtoTcpIntegrationSpec— a behavioral guard asserting that a refused outgoing connection always ends withTcp.CommandFailed(the actor must never hang).Note: #8214 used a deterministic cross-platform regression test built on the DNS IPv4→IPv6 fallback retry path.
devremoved that fallback path, andPlatformNotSupportedExceptionis architecture-specific (does not reproduce on x64), so that exact test cannot be ported here. This behavioral test catches the regression on the affected platform (arm64 Linux) and guards the "never hang" contract everywhere; on x64 it passes regardless, consistent with x64 not exhibiting the bug.Full
Akka.Tests.IOsuite (52 tests) passes locally.