-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport: Fix deadlock in client keepalive. #1460
Conversation
Hey @tsuna Thanks for looking into the deadlock and coming up with a solution.
When we initialize the transport we make this channel non-writable by writing some data on it so that later when keepalive routine realizes it must go dormant it makes the channel writable by reading from it. Notice at this time a lock must be acquired since the condition to go dormant depends on number of active streams. I suggest we make awakenKeepalive non-writable by writing data on it again in operateHeader. |
Just to be sure I understand your suggestion, you're saying that right after this code: if len(t.activeStreams) == 1 {
select {
case t.awakenKeepalive <- struct{}{}:
t.framer.writePing(false, false, [8]byte{}) we need to add: t.awakenKeepalive <- struct{}{} if yes, then this appears to works, although it looks weird. Definitely something that's going to deserve a comment in the code. I'm not sure why you want to keep holing Either way, I'm happy as long as this hole is plugged. |
@tsuna Yes you understood that right and glad that it solves the problem. Your concern is valid that this code is quite complicated with the locks and channels. However, holding that lock while making awakenKeepalive writable again is necessary. If we released the lock before reading off the channel, it might so happen that NewStream code might get executed which tries to write on the channel but doesn't since the channel isn't writable yet. Later when keepalive routine starts executing again and expects to read from awakenKeepalive it won't be able to since it missed the opportunity. Holding the lock ensures that if keepalive sees number of streams to be 0 it can safely go dormant by making awakenKeepalive writable. |
You're right, good catch. |
When gRPC keepalives are enabled (which isn't the case by default at this time) and PermitWithoutStream is false (the default), the client can deadlock when transitioning between having no active stream and having one active stream. Subsequent attempts to create a new stream or to close the client will hang on the transport's mutex, while the keepalive goroutine is waiting indefinitely on a channel while holding the transport's mutex. This fixes grpc#1459.
I amended my commit, PTAL. Already have a CLA on file with Google. |
@tsuna Thanks for taking care of this. Looks good. |
When gRPC keepalives are enabled (which isn't the case by default at this time) and PermitWithoutStream is false (the default), the client can deadlock when transitioning between having no active stream and having one active stream. Subsequent attempts to create a new stream or to close the client will hang on the transport's mutex, while the keepalive goroutine is waiting indefinitely on a channel while holding the transport's mutex. This fixes grpc#1459.
When gRPC keepalives are enabled (which isn't the case by default at
this time) and PermitWithoutStream is false (the default), the client
can deadlock when transitioning between having no active stream and
having one active stream. Subsequent attempts to create a new stream
or to close the client will hang on the transport's mutex, while the
keepalive goroutine is waiting indefinitely on a channel while holding
the transport's mutex.
This fixes #1459.