add logging and a max wait time to vtgate shutdown drain#3560
add logging and a max wait time to vtgate shutdown drain#3560sougou merged 3 commits intovitessio:masterfrom
Conversation
Wait no more than 30 seconds for active vtgate mysql connections to drain, and emit a log every 2 seconds while waiting with the count of connections that we're waiting for.
go/vt/vtgate/plugin_mysql_server.go
Outdated
| log.Infof("Waiting for all client connections to be idle (%d active)...", atomic.LoadInt32(&busyConnections)) | ||
| start := time.Now() | ||
| reported := start | ||
| for time.Since(start) < 30*time.Second { |
There was a problem hiding this comment.
There is a global timeout for OnTermSync() hooks controlled by -onterm_timeout (defaults to 10 seconds). Can we depend on that instead?
|
Thanks for the PR and all the comments @demmer! Apologies for not getting back to you sooner -- work has been super busy and I haven't had enough time to keep up with you and sougou. This PR LGTM once we square away my question about the Re: #3518 -- To be completely honest, I didn't feel ready for it to be merged in the state that it was in. In addition to wanting to write tests, I'm not convinced that the functionality we ended up with will be reliable enough for HubSpot. We have many mysql-heavy apps communicating with Vitess through a small number of vtgates, and while it's likely that each individual connection will go idle within a reasonable length of time, it's not likely that we'll observe all connections to be idle at the same exact time. This will cause us to exhaust the OnTermSync timeout and then all connections will be closed regardless of whether or not they were idle, which is unacceptable. This is why I liked the original approach better: proactively closing connections that we know are OK to be closed so that we minimize the ones that are closed in the middle of a query / transaction when we run out of time. Does that make sense? Can you tell me a bit about the environment you're running Vitess in where functionality as it currently exists in |
|
In Slack's environment we tend to have very short-lived connections from the app to the vtgate, so once the listener is closed and no new connections are accepted, the existing ones should generally complete in a short period of time, which is why waiting for an idle period even when connections are open seemed fine to me. Furthermore, before a clean vtgate shutdown/restart we first remove it from our service discovery and wait for the app servers to stop sending connections to it, so that all in all the vtgate should be totally idle before we try to shut it down. However, to address your concerns, one approach could be to set a That seems to me to be a lower overhead approach of accomplishing the same goal without the complexity / risk of tracking every active connection in a map. |
Wait no more than 30 seconds for active vtgate mysql connections to
drain, and emit a log every 2 seconds while waiting with the count
of connections that we're waiting for.