Skip to content

Sockets.Unix race between receive completion and cancellation? #115217

@tmds

Description

@tmds

I'm doing some testing of https://github.com/tmds/Tmds.Ssh/ and I occasionally get an unexpected runtime crash:

Fatal error. Internal CLR error. (0x80131506)
   at System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)
   at System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef)
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.CancellationToken.ThrowIfCancellationRequested()
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.ThrowException(System.Net.Sockets.SocketError, System.Threading.CancellationToken)
   at System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16)
   at Tmds.Ssh.StreamSshConnection+<ReceiveAsync>d__21.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Int32, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext(System.Threading.Thread)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1[[System.Boolean, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].SetResult(Boolean)
   at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32, System.Memory`1<Byte>, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

Based on the stacktrace, I think this is due to a race between a receive operation on the socket that is completing succesfully (TransferCompletionCallbackCore at the bottom of the stack), and that receive operation also completing due to cancellation (CancellationToken.ThrowOperationCanceledException at the top of the stack).

To support that hypothesis, I changed Tmds.Ssh's receive code to cancel through Task.WaitAsync rather than cancelling the socket operation. When I make this change, the crashes no longer occur.

     private async ValueTask<int> ReceiveAsync(CancellationToken ct)
     {
         var memory = _receiveBuffer.AllocGetMemory(Constants.PreferredBufferSize);
-        int received = await _stream.ReadAsync(memory, ct).ConfigureAwait(false);
+        int received;
+        Task<int> receiveTask = _stream.ReadAsync(memory).AsTask();
+        try
+        {
+            received = await receiveTask.WaitAsync(ct).ConfigureAwait(false);
+        }
+        catch
+        {
+            (_stream as System.Net.Sockets.NetworkStream)!.Socket.Dispose();
+
+            await receiveTask.ConfigureAwait(false);
+
+            throw;
+        }
+
         _receiveBuffer.AppendAlloced(received);
         return received;
     }

I saw this issue on a setup I create specifically for my tests (with cloud VMs). I don't have a easy reproducer atm.

I have plenty of things on my plate for the next week or two. After that, I should have some time to look into this further and provide additional information and do some debugging.

cc @karelz @antonfirsov @stephentoub

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions