-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unavailable error when MaxConnectionAge and MaxConnectionAgeGrace is enabled on the server #2767
Comments
Can you turn on logging for the client and server? For go client: For C++ server: try |
ping @wjywbs - could you gather the debugging info requested and do you have a minimal reproducible test case? |
Sure. I'll try to create a minimized demo. |
I tried with a Go server and Go client (which makes one connection to the server and repeatedly makes RPCs on it). But, when the |
I think this is what is happening: Since And in the client in At this point, I don't know enough to say whether this is a bug or is WAI :-) |
If |
Yes, I set both age and grace to 60s and each rpc finishes within a few seconds, but the unavailable errors appear a lot of times. |
@wjywbs do you have a minimal reproducible test case we can use to replicate the problem you are seeing? |
hi @dfawley @wjywbs, i set MaxConnectionAge for long live grpc stream server recently, then a lots of grpc log "transport is closing", so i found this issue and write a test case to replicate the problem. you may need use -count 100 because sometimes it test pass sometimes it test fail. // 1. run case 1
// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
// result: test fail
//
package testcase1
import (
"context"
"google.golang.org/grpc"
"google.golang.org/grpc/examples/route_guide/routeguide"
"google.golang.org/grpc/grpclog"
"google.golang.org/grpc/keepalive"
"io"
"log"
"net"
"net/http"
"net/http/pprof"
"os"
"testing"
"time"
)
type routeGuidService struct {
}
func (*routeGuidService) GetFeature(ctx context.Context, p *routeguide.Point) (*routeguide.Feature, error) {
return &routeguide.Feature{}, nil
}
func (*routeGuidService) ListFeatures(p *routeguide.Rectangle, s routeguide.RouteGuide_ListFeaturesServer) error {
return nil
}
func (*routeGuidService) RecordRoute(s routeguide.RouteGuide_RecordRouteServer) error {
return nil
}
func (*routeGuidService) RouteChat(s routeguide.RouteGuide_RouteChatServer) error {
go func() {
err := s.Send(&routeguide.RouteNote{Message: "server send"})
if err != nil {
log.Printf("failed to send:%s", err)
}
}()
for {
_, err := s.Recv()
if err != nil {
if err == io.EOF {
return nil
}
log.Printf("failed to recv:%s", err)
return err
}
}
}
// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
func TestKeepaliveMaxAge(t *testing.T) {
grpclog.SetLoggerV2(grpclog.NewLoggerV2WithVerbosity(os.Stderr, os.Stderr, os.Stderr, 10))
keepAliveMaxConnectionAge := time.Second * 5
// 1. star server
s := grpc.NewServer(
grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionAge: keepAliveMaxConnectionAge,
}),
)
routeguide.RegisterRouteGuideServer(s, &routeGuidService{})
const addr = "127.0.0.1:8000"
const pprofAddr = "127.0.0.1:8001"
go func() {
_ = pprof.Index
err := http.ListenAndServe(pprofAddr, nil)
if err != nil && err != http.ErrServerClosed {
return
}
}()
ln, err := net.Listen("tcp", addr)
if err != nil {
t.Fatal(err)
return
}
go func() {
err := s.Serve(ln)
if err != nil {
t.Error(err)
return
}
}()
defer func() {
s.GracefulStop()
}()
// wait server start up
time.Sleep(time.Millisecond * 100)
// 2. start client
cc, err := grpc.DialContext(context.Background(), addr, grpc.WithInsecure())
if err != nil {
t.Fatal(err)
return
}
// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
t.Run("case1-Fatal-Unavailable-transport is closing", func(t *testing.T) {
timer := time.NewTimer(keepAliveMaxConnectionAge + time.Second)
for {
_, err := routeguide.NewRouteGuideClient(cc).GetFeature(context.Background(), &routeguide.Point{})
if err != nil {
// unexpected: its Fatal
// must use grpc.FailFast(false), it's ok
t.Fatal(err)
return
}
select {
case <-timer.C:
return
default:
}
}
})
} |
update. if i use grpc.FailFast(false) or grpc.WithDefaultCallOptions(grpc.FailFast(false)), it's will work fine. |
We suspect this may be fixed by #2857. Can you try with this change and let us know if it resolves your problems? |
I tested v1.22.0, and there were still many "rpc error: code = Unavailable desc = the connection is draining". With grpc.WaitForReady(true), there were less errors and the descriptions were "the connection is draining" and "transport is closing". |
It seems this might be a problem with how we handle the race between choosing a transport and the transport getting the GOAWAY from the server. I'll take a look next week. |
Thanks for your fix @dfawley. I tested the fix by sending rpcs for 16 minutes to servers with 60s max connection age and grace. There were 4 occurrences of "rpc error: code = Unavailable desc = the connection is draining", which were much fewer. But this also means that I need to check for this error and retry if a few requests can't be dropped. I didn't test grpc.WaitForReady(true) this time. |
What version of gRPC are you using?
v1.18.0/v1.19.1
What version of Go are you using (
go version
)?1.11
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
I tried to enable MaxConnectionAge on a C++ server and call from a Go client. When the connection age was reached, the rpcs in the Go client failed with the unavailable error and the "transport is closing" message.
What did you expect to see?
No unavailable errors.
What did you see instead?
I test with both age and grace time set to one minute, and the Go client reported lots of errors. Each rpc takes a few seconds to complete, within the one minute grace period.
Reference: grpc/grpc#12295 (comment)
The text was updated successfully, but these errors were encountered: