Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispose server connection after client has been disposed #56720

Conversation

alnikola
Copy link
Contributor

@alnikola alnikola commented Aug 2, 2021

There is a race condition in Http2Connection in processing of GOAWAY frame between disposing the client's SslStream (_stream field) together with shutting down Http2Connection and sending RST_STREAM for aborted Http2Streams. It could be that a stream put an outgoing RST_STREAM on _writeChannel.Writer queue, but then _writeChannle.Writer has got disposed before sending that RST_STREAM frame.

The above behavior can lead to test code gets hang if it calls Http2LoopbackConnection.ShutdownIgnoringErrorsAsync directly or as part of a Dispose call because this method first sends GOAWAY frame to client and then waits for any frame from client which can never arrive due to that race condition.

This PR changes the call order, so Http2LoopbackConnection is called after the HttpClient has been disposed.

Fixes #44183

@ghost
Copy link

ghost commented Aug 2, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

There is a race condition in Http2Connection in processing of GOAWAY frame between disposing _writeChannel.Writer together with Http2Connection and sending RST_STREAM for aborted Http2Streams. It could be that a stream put an outgoing RST_STREAM on _writeChannel.Writer queue, but then _writeChannle.Writer has got disposed before sending that RST_STREAM frame.

The above behavior can lead to test code gets hang if it calls Http2LoopbackConnection.ShutdownIgnoringErrorsAsync directly or as part of a Dispose call because this method first sends GOAWAY frame to client and then waits for any frame from client which can never arrive due to that race condition.

This PR changes the call order, so Http2LoopbackConnection is called after the HttpClient has been disposed.

Fixes #44183

Author: alnikola
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@alnikola alnikola added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 2, 2021
@alnikola alnikola requested a review from a team August 2, 2021 14:43
@alnikola alnikola removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 2, 2021
@alnikola alnikola marked this pull request as ready for review August 2, 2021 14:43
@antonfirsov
Copy link
Member

antonfirsov commented Aug 3, 2021

but then _writeChannle.Writer has got disposed before sending that RST_STREAM frame

Are you sure about this? Channel<T> and ChannelWriter<T> is not IDisposable. What I see in Http2Connection.Dispose() is _writeChannel.Writer.Complete(), which AFAIK only initiates graceful shutdown, so _writeChannel.Reader.WaitToReadAsync() will probably still read the RST_STREAM inside ProcessOutgoingFramesAsync(). Not an expert on channels though.

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

Are you sure about this? Channel and ChannelWriter is not IDisposable

Sorry, it's my mistake. You are right that it's a completion not disposal.

_writeChannel.Reader.WaitToReadAsync() will probably still read the RST_STREAM inside ProcessOutgoingFramesAsync()

It's not guaranteed in case of GOAWAY processing because Http2Stream only puts RST_STREAM on the outgoing queue, but doesn't wait for it to be sent. Once it has been put onto, Http2Stream will call Complete on itself which will remove it from the connection's list.

In more details, the GOAWAY processing logic is as follows:

  1. Http2Connection reads next frame and sees that's a GOAWAY
  2. It call ProcessGoAwayFrame on itself
  3. This method calls Shutdown() -> SignalShutdownWaiter() which unblock the waiter by setting result. The waiter here is a thread in HttpConnectionPool.AddHttp2ConnectionAsync() waiting on shutdownTask
  4. Next, ProcessGoAwayFrame calls OnReset on each Http2Stream owned by the connection
  5. The following happens in parallel
Http2Connection / Http2Stream HttpConnectionPool / Http2Connection
Http2Stream.OnReset calls Http2Stream.Cancel It enters InvalidateHttp2Connection() where it calls Http2Connection.Dispose()
Cancel() calls Http2Stream.SendReset() which calls Http2Connection.SendRstStreamAsync, but this method only puts a RST_STREAM frame on _writerChannel's queue Dispose() sets _disposed = true and checks if _streamsInUse == 0
Execution returns from SendReset() back to Cancel() which then calls Http2Stream.Complete() It sees _streamsInUse > 0, so simply exits Dispose()
Complete() calls Http2Connection.RemoveStream(this) which removes the stream from _httpStreams collection and calls ReleaseStream()
ReleaseStream() decrements _streamsInUse and checks if it's equal to 0 now
ReleaseStream() sees that _streamsInUse == 0 and _disposed == true, so it calls FinalTeardown()
FinalTeardown() disposes SslStream by calling _stream.Dispose() Http2Connection.ProcessOutgoingFramesAsync() calls _writeChannel.Reader.TryRead and gets that RST_STREAM frame, then it calls writeEntry.InvokeWriteAction
writeEntry.InvokeWriteAction calls a delegate from Http2Connection.SendRstStreamAsync()
SendRstStreamAsync() tries to write into SslStream, but the stream is already disposed, so it throws an ObjectDisposedException
The Http2LoopbackServer never receives that RST_STREAM frame

@antonfirsov
Copy link
Member

Thanks for the great summary! The race condition seems to be very general and independent from the UDS transport and Windows. Any ideas why is the failure Windows only and why don't we see it in other tests following a similar pattern?

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

I think it has something to do with threads scheduling. There might be a Windows-specific behavior in the thread pool which exposes this race.

@antonfirsov
Copy link
Member

and then waits for any frame from client which can never arrive due to that race condition.

I think ShutdownIgnoringErrorsAsync and the underlying WaitForConnectionShutdownAsync does not / should not wait on the receival of arbitrary control frames like RST_STREAM. ReadFrameAsync should return null on a zero byte read or an IOException in case the client connection has been shut down. We should understand why doesn't that happen with the UDS stream.

@geoffkizer
Copy link
Contributor

A couple thoughts:

First, I think we have an issue here where we should not be sending RST_STREAM in this case.

The OnReset logic is used for receiving both GOAWAY and RST_STREAM. The spec is very clear that when receiving a RST_STREAM, we must not send a RST_STREAM back:

To avoid looping, an endpoint MUST NOT send a RST_STREAM in response to a RST_STREAM frame.

https://httpwg.org/specs/rfc7540.html#StreamErrorHandler

After receiving a RST_STREAM on a stream, the receiver MUST NOT send additional frames for that stream, with the exception of PRIORITY.

https://httpwg.org/specs/rfc7540.html#RST_STREAM

There doesn't seem to be any similar language for GOAWAY, but since we use the same code to handle both it doesn't really matter.

I thought we had a test for this -- i.e. that we don't send RST_STREAM after receiving it -- but I can't find it now. If not we should add one.

@geoffkizer
Copy link
Contributor

Another thought, re this:

FinalTeardown() disposes SslStream by calling _stream.Dispose()

FinalTeardown will also complete the writer queue, i.e. call _writeChannel.Writer.Complete. This will eventually cause the write queue to drain.

It might be better to defer disposing the underlying stream until the write queue is completely drained. I don't think it should matter, since at this point all requests are completed -- but at the very least it would avoid some unnecessary exceptions.

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

First, I think we have an issue here where we should not be sending RST_STREAM in this case.

@geoffkizer Do you suggest to change that logic now? I'm afraid that it can be too big change for this release phase.

@geoffkizer
Copy link
Contributor

Do you suggest to change that logic now? I'm afraid that it can be too big change for this release phase.

We should certainly consider it. We are breaking the RFC here. Any server would be within their rights to fail the entire connection when we do this. I don't know how likely that is to happen in practice, but I know we would 100% service this if a customer did hit it.

I assume this is a regression from 5.0, right? Do we know how we regressed this?

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

I assume this is a regression from 5.0, right? Do we know how we regressed this?

Just to clarify, stopping sending RST_STREAM in OnReset won't fix this test because due to some reason it cannot get a zero-byte read on the connection close.

If you are asking whether sending RST_STREAM in OnReset is a regression from 5.0, then I don't know currently. Need to check that code.

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

I think ShutdownIgnoringErrorsAsync and the underlying WaitForConnectionShutdownAsync does not / should not wait on the receival of arbitrary control frames like RST_STREAM. ReadFrameAsync should return null on a zero byte read or an IOException in case the client connection has been shut down. We should understand why doesn't that happen with the UDS stream.

@antonfirsov Now, I do agree with you. Since, we discovered that RST_STREAM shouldn't be sent here at all, so the only way the server can learn that connection has been closed is via zero-byte read.

@alnikola
Copy link
Contributor Author

alnikola commented Aug 3, 2021

Triage: We decided to keep the test disabled for now.

@alnikola alnikola closed this Aug 3, 2021
@alnikola alnikola deleted the alnikola/44183-fix-unix-domain-sock-test-win branch August 4, 2021 11:29
@alnikola
Copy link
Contributor Author

alnikola commented Aug 4, 2021

Submitted #56830

@karelz karelz added this to the 6.0.0 milestone Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Http2.ConnectCallback_UseUnixDomainSocket_Success failing on Windows
4 participants