Skip to content

GracefulStop hangs forever if the client doesn't the close underlying tcp connection.  #5930

@aureliar8

Description

@aureliar8

What version of gRPC are you using?

Go client & server: google.golang.org/grpc v1.51.0
Java client:

<grpc.version>1.51.1</grpc.version>
<protobuf.version>3.21.7</protobuf.version>
<protoc.version>3.21.7</protoc.version>

What version of Go are you using (go version)?

go1.19.1

What operating system (Linux, Windows, …) and version?

Ubuntu 22.10 (Linux 5.14)

What did you do?

First of all I'm not sure whether this something I misunderstood, a documentation issue or a bug.

When using a basic java client & the default go server configuration, if I call server.GracefulStop() during an RPC, the server never stops and hangs. This is not the case if I use a go client, or if GracefulStop is called while there is no active RPC.

I made a small repository to reproduce the case [[https://github.com/aureliar8/grpc-restart]].

What did you expect to see?

I expect the server to shutdown after the last in-flight RPCs finish.

What did you see instead?

The last in-flight RPC finishes but the server doesn't stop.

Additional details

The documentation for server.GracefulStop() reads:

// GracefulStop stops the gRPC server gracefully. It stops the server from
// accepting new connections and RPCs and blocks until all the pending RPCs are
// finished.

However it actually wait until all (tcp) connections are closed:

grpc-go/server.go

Lines 1861 to 1863 in ce56cef

for len(s.conns) != 0 {
s.cv.Wait()
}

I suppose that tcp connections are meant to be closed after GracefulStop is initiated and once in-flight RPCs finish.

This is what the go grpc client implementation does. But the java client doesn't (at least not with the default configuration). So the go server gets stuck because there is still one connection opened and thus never return from GracefulStop().

What's weird is that there is a mechanism to close the tcp connection from the go server if there are no active RPC. But it is only checked when the second GOAWAY frame is send (so quite early in the GracefulStop timeline)

if len(t.activeStreams) == 0 {
retErr = errors.New("second GOAWAY written and no active streams left to process")
}

(Specifically returning an error from the outgowingGoAwayHandler will make loopy return and close the underlying connection)

The question is:

  • Who's responsible for closing the underlying connection after GracefulStop has been requested and the last RPC finishes ?
  • If it's the client, why is there some specific code that checks that the number of in-flight RPC is zero when sending the second GOAWAY frame. (And this also means that the bug is in the java grpc client implementation)
  • If it's the server, there is no mechanism to do so if there are still in-flight RPCs after the second GOAWAY frame is send.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions