Skip to content

Conversation

jgold2-stripe
Copy link

@jgold2-stripe jgold2-stripe commented Aug 21, 2025

Fixes: #8425

This PR adds a call to net.Conn.SetWriteDeadline, as discussed in #8425 (comment). Additionally, it updates the previous call to SetReadDeadline to log any non-nil error value (this doesn't affect behavior but proved helpful in some earlier debugging).

RELEASE NOTES:

  • client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection.

@jgold2-stripe jgold2-stripe changed the title Keepalive transport/http2_client.go: Invoke net.Conn.SetWriteDeadline in Close Aug 22, 2025
Copy link

codecov bot commented Aug 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.94%. Comparing base (9186ebd) to head (b9e53e2).
⚠️ Report is 41 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8534      +/-   ##
==========================================
- Coverage   82.47%   81.94%   -0.54%     
==========================================
  Files         413      412       -1     
  Lines       40513    40468      -45     
==========================================
- Hits        33415    33162     -253     
- Misses       5743     5920     +177     
- Partials     1355     1386      +31     
Files with missing lines Coverage Δ
internal/transport/http2_client.go 92.36% <100.00%> (+0.24%) ⬆️

... and 253 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test to verify the fix and catch regressions? You can refer to the test for the write deadline:

// hangingConn is a net.Conn wrapper for testing, simulating hanging connections
// after a GOAWAY frame is sent, of which Write operations pause until explicitly
// signaled or a timeout occurs.
type hangingConn struct {
net.Conn
hangConn chan struct{}
startHanging *atomic.Bool
}
func (hc *hangingConn) Write(b []byte) (n int, err error) {
n, err = hc.Conn.Write(b)
if hc.startHanging.Load() {
<-hc.hangConn
}
return n, err
}
// Tests the scenario where a client transport is closed and writing of the
// GOAWAY frame as part of the close does not complete because of a network
// hang. The test verifies that the client transport is closed without waiting
// for too long.
func (s) TestClientCloseReturnsEarlyWhenGoAwayWriteHangs(t *testing.T) {
// Override timer for writing GOAWAY to 0 so that the connection write
// always times out. It is equivalent of real network hang when conn
// write for goaway doesn't finish in specified deadline
origGoAwayLoopyTimeout := goAwayLoopyWriterTimeout
goAwayLoopyWriterTimeout = time.Millisecond
defer func() {
goAwayLoopyWriterTimeout = origGoAwayLoopyTimeout
}()
// Create the server set up.
connectCtx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
server := setUpServerOnly(t, 0, &ServerConfig{}, normal)
defer server.stop()
addr := resolver.Address{Addr: "localhost:" + server.port}
isGreetingDone := &atomic.Bool{}
hangConn := make(chan struct{})
defer close(hangConn)
dialer := func(_ context.Context, addr string) (net.Conn, error) {
conn, err := net.Dial("tcp", addr)
if err != nil {
return nil, err
}
return &hangingConn{Conn: conn, hangConn: hangConn, startHanging: isGreetingDone}, nil
}
copts := ConnectOptions{Dialer: dialer}
copts.ChannelzParent = channelzSubChannel(t)
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
// Create client transport with custom dialer
ct, connErr := NewHTTP2Client(connectCtx, ctx, addr, copts, func(GoAwayReason) {})
if connErr != nil {
t.Fatalf("failed to create transport: %v", connErr)
}
if _, err := ct.NewStream(ctx, &CallHdr{}); err != nil {
t.Fatalf("Failed to open stream: %v", err)
}
isGreetingDone.Store(true)
ct.Close(errors.New("manually closed by client"))
}

@arjan-bal arjan-bal changed the title transport/http2_client.go: Invoke net.Conn.SetWriteDeadline in Close transport: Invoke net.Conn.SetWriteDeadline in http2_client.Close Aug 22, 2025
@arjan-bal arjan-bal added this to the 1.76 Release milestone Aug 22, 2025
@jgold2-stripe
Copy link
Author

PTAL @arjan-bal

@jgold2-stripe jgold2-stripe removed their assignment Aug 22, 2025
@arjan-bal arjan-bal self-assigned this Aug 25, 2025
@@ -989,7 +989,14 @@ func (t *http2Client) closeStream(s *ClientStream, err error, rst bool, rstCode
// only once on a transport. Once it is called, the transport should not be
// accessed anymore.
func (t *http2Client) Close(err error) {
t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10))
if err := t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10)); err != nil {
logger.Warningf("Failed to set write deadline when closing connection: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs are too chatty. They're appearing almost every time a connection is closed, e.g:

tlogger.go:126: WARNING http2_client.go:993 [transport] Failed to set write deadline when closing connection: set tcp 127.0.0.1:59286: use of closed network connection  (t=+590.186033ms)

I don't think we should log here because we are calling conn.Close in another goroutine and expecting these calls to fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the logging.

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Adding a second reviewer.

@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Aug 28, 2025
@arjan-bal arjan-bal requested a review from dfawley August 28, 2025 05:46
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change. A couple small things to potentially change.

Comment on lines 992 to +995
t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10))
// For background on the deadline value chosen here, see
// https://github.com/grpc/grpc-go/issues/8425#issuecomment-3057938248 .
t.conn.SetReadDeadline(time.Now().Add(time.Second))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better/simpler to just use SetDeadline with the same 1 second deadline, if possible.

co := ConnectOptions{
Dialer: dialer,
}
server, client, cancel := setUpWithOptions(t, 0, &ServerConfig{}, normal, co)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to set read/write deadlines during handshaking (we currently only block on the context here, but our server does set deadlines here), then this test would be invalid. I think we should either confirm that the values are still false before calling Close, or clear them unconditionally then.

Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transport close after keepalive failure blocks on unresponsive reader
4 participants