add logging and a max wait time to vtgate shutdown drain by demmer · Pull Request #3560 · vitessio/vitess

demmer · 2018-01-17T14:51:36Z

Wait no more than 30 seconds for active vtgate mysql connections to
drain, and emit a log every 2 seconds while waiting with the count
of connections that we're waiting for.

Wait no more than 30 seconds for active vtgate mysql connections to drain, and emit a log every 2 seconds while waiting with the count of connections that we're waiting for.

demmer · 2018-01-17T14:52:58Z

@tpetr / @sougou this is some of what I mentioned on #3518.

I'd still feel better if we had a test for this behavior, but at least now there's some insight into what's happening during the shutdown.

tpetr · 2018-01-18T00:39:42Z

go/vt/vtgate/plugin_mysql_server.go

+		log.Infof("Waiting for all client connections to be idle (%d active)...", atomic.LoadInt32(&busyConnections))
+		start := time.Now()
+		reported := start
+		for time.Since(start) < 30*time.Second {


There is a global timeout for OnTermSync() hooks controlled by -onterm_timeout (defaults to 10 seconds). Can we depend on that instead?

tpetr · 2018-01-18T01:14:42Z

Thanks for the PR and all the comments @demmer! Apologies for not getting back to you sooner -- work has been super busy and I haven't had enough time to keep up with you and sougou. This PR LGTM once we square away my question about the onTermSyncHooks timeout.

Re: #3518 -- To be completely honest, I didn't feel ready for it to be merged in the state that it was in. In addition to wanting to write tests, I'm not convinced that the functionality we ended up with will be reliable enough for HubSpot. We have many mysql-heavy apps communicating with Vitess through a small number of vtgates, and while it's likely that each individual connection will go idle within a reasonable length of time, it's not likely that we'll observe all connections to be idle at the same exact time. This will cause us to exhaust the OnTermSync timeout and then all connections will be closed regardless of whether or not they were idle, which is unacceptable. This is why I liked the original approach better: proactively closing connections that we know are OK to be closed so that we minimize the ones that are closed in the middle of a query / transaction when we run out of time. Does that make sense? Can you tell me a bit about the environment you're running Vitess in where functionality as it currently exists in master is preferred? I'm happy to open another PR if we're cool with revisiting my original approach.

demmer · 2018-01-23T23:14:31Z

In Slack's environment we tend to have very short-lived connections from the app to the vtgate, so once the listener is closed and no new connections are accepted, the existing ones should generally complete in a short period of time, which is why waiting for an idle period even when connections are open seemed fine to me. Furthermore, before a clean vtgate shutdown/restart we first remove it from our service discovery and wait for the app servers to stop sending connections to it, so that all in all the vtgate should be totally idle before we try to shut it down.

However, to address your concerns, one approach could be to set a shuttingDown barrier that you check in the query handler to reject any new transactions from being started, and/or to prevent any queries from coming in that aren't in a transaction.

That seems to me to be a lower overhead approach of accomplishing the same goal without the complexity / risk of tracking every active connection in a map.

sougou · 2018-01-25T16:08:08Z

LGTM

demmer added 2 commits January 17, 2018 06:24

refactor mysql protocol shutdown to enable testing

b486677

add more logging and a max wait time to vtgate shutdown drain

b9d32c1

Wait no more than 30 seconds for active vtgate mysql connections to drain, and emit a log every 2 seconds while waiting with the count of connections that we're waiting for.

googlebot added the cla: yes label Jan 17, 2018

tpetr reviewed Jan 18, 2018

View reviewed changes

remove the explicit shutdown timer since the OnTermSync has its own

dd2be93

sougou merged commit 4a3f148 into vitessio:master Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add logging and a max wait time to vtgate shutdown drain#3560

add logging and a max wait time to vtgate shutdown drain#3560
sougou merged 3 commits intovitessio:masterfrom
tinyspeck:vtgate-mysql-server-drain-logs

demmer commented Jan 17, 2018

Uh oh!

demmer commented Jan 17, 2018

Uh oh!

tpetr Jan 18, 2018

Uh oh!

tpetr commented Jan 18, 2018

Uh oh!

demmer commented Jan 23, 2018

Uh oh!

sougou commented Jan 25, 2018 •

edited by alainjobart

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

demmer commented Jan 17, 2018

Uh oh!

demmer commented Jan 17, 2018

Uh oh!

tpetr Jan 18, 2018

Choose a reason for hiding this comment

Uh oh!

tpetr commented Jan 18, 2018

Uh oh!

demmer commented Jan 23, 2018

Uh oh!

sougou commented Jan 25, 2018 • edited by alainjobart Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sougou commented Jan 25, 2018 •

edited by alainjobart

Loading