SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios

HTTP/2 pings are used to keep connections from being completely idle, and to detect dead ones.
With long lived requests that have no implied timeout (e.g. gRPC streaming), they are the way to eventually detect and abort unresponsive servers.

The way our logic works today, [we'll only run heartbeat logic on HTTP/2 connections that are in the `_availableHttp2Connections` list](https://github.com/dotnet/runtime/blob/6bbf3520b7232e19c2d1c4d59631a318199f847d/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http2.cs#L568-L581).

This means that pings are broken in the following scenarios:
1. The connection reached its stream limit and was [temporarily removed from the available list](https://github.com/dotnet/runtime/blob/6bbf3520b7232e19c2d1c4d59631a318199f847d/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http2.cs#L109-L120).
2. We received a GOAWAY frame indicating that existing requests will be processed. [The first thing we do when receiving the frame is to mark the connection as shut down, and remove it from the available pool.](https://github.com/dotnet/runtime/blob/6bbf3520b7232e19c2d1c4d59631a318199f847d/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs#L1080-L1094)
3. If the `SocketsHttpHandler` instance is disposed. Today any existing requests on the instance will continue as normal, but we're also [stopping the heartbeat timer](https://github.com/dotnet/runtime/blob/6bbf3520b7232e19c2d1c4d59631a318199f847d/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPoolManager.cs#L451-L454) and therefore breaking pings. This case could be looked at as user error to a degree though.


The 2nd issue was hit by a user who reported it on Discord (after lots of back and forth to figure out why things aren't working as expected). In their case they are using long-lived gRPC streams, and configured HTTP/2 ping keep alives and timeouts. When the backend service was restarting/scaling down/failing etc., they observed seeing a GOAWAY frame, sometimes eventually followed by the request failing due to the connection observing an EOF.
But sometimes, they would receive a GOAWAY frame and then nothing happened, all requests appearing stuck forever. This is the case where we might not notice that a TCP connection is dead until we try writing to it / enforce a reading timeout - precisely what HTTP/2 pings are supposed to do.

Example traces

<img width="1760" height="813" alt="Image" src="https://github.com/user-attachments/assets/69c13fee-9789-44f6-969c-c793a2325781" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios #120179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios #120179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions