Add a sema to bound the number of inflight vexec queries when getting workflows by ajm188 · Pull Request #8353 · vitessio/vitess

ajm188 · 2021-06-18T18:20:05Z

Signed-off-by: Andrew Mason amason@slack-corp.com

Description

What it says in the title. Current defaults are somewhat arbitrary 1000 concurrent queries with a 50ms timeout to get a conn.

Related Issue(s)

Checklist

Tests were added or are not required -- n/a
Documentation was added or is not required -- n/a

Deployment Notes

… workflows Signed-off-by: Andrew Mason <amason@slack-corp.com>

setassociative

Ideally we'd have a little more information around the concurrent query utilization for normal operations workflow before setting a limit. Basically - if this is a problem we should tune to control the problem; if it's not then why introduce it?

That said I believe this is safe. If there is a larger thing we're chasing let's try to get it into an issue.

No major objections to merging this as is though -- it seems like a good pattern for endpoints that can stand throttling. (🗒️ Implicitly: GetWorkflows is not in the hot path anywhere afaik, does that vibe with your understanding)

setassociative · 2021-06-18T19:54:58Z

go/vt/vtctl/workflow/server.go

 	vx := vexec.NewVExec(req.Keyspace, "", s.ts, s.tmc)
+
+	if !s.vexecPool.AcquireContext(ctx) {
+		return nil, ErrVExecConnTimeout
+	}


tioli:

func NewVExecThrottled(..., pool *sync2.Semaphore) *VExec

and then moving other vexec endpoints over to it is a tiiiiiiny bit less effort; also you don't have to remember to manage locking around each query

setassociative · 2021-06-18T20:02:20Z

go/vt/vtctl/workflow/server.go

+	vExecPoolSize    = flag.Uint("workflow_server_vexec_pool_size", 1000, "maximum number of concurrent vexec queries to allow")
+	vExecPoolTimeout = flag.Duration("workflow_server_vexec_pool_default_timeout", time.Millisecond*50, "default timeout to wait acquiring a connection from the vexec pool. zero implies no timeout")


we should update release notes with this and if we expect to roll out limits per call (imo) we should have a better approach than a flag for concurrency-per-call

setassociative · 2021-06-18T20:56:33Z

Edit: It was transient

From unit tests the e2e is worth looking into;

--- FAIL: TestCellAliasVreplicationWorkflow (360.09s)
--- FAIL: TestCellAliasVreplicationWorkflow/shardCustomer (300.99s)

The 300s reads like a timeout but that's adjacent to "we exhausted our pool and dead locked somehow." I don't see how that'd happen given code but 🤷. I kicked off a rerun in case it was transient.

ajm188 · 2021-06-23T01:26:30Z

#8368 is looking like a more promising (and holistic) fix. Going to close this

Add a sema to bound the number of inflight vexec queries when getting…

a873964

… workflows Signed-off-by: Andrew Mason <amason@slack-corp.com>

ajm188 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management labels Jun 18, 2021

ajm188 requested review from doeg, rafael and setassociative June 18, 2021 18:20

ajm188 requested a review from deepthi as a code owner June 18, 2021 18:20

doeg approved these changes Jun 18, 2021

View reviewed changes

setassociative approved these changes Jun 18, 2021

View reviewed changes

ajm188 closed this Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a sema to bound the number of inflight vexec queries when getting workflows#8353

Add a sema to bound the number of inflight vexec queries when getting workflows#8353
ajm188 wants to merge 1 commit intovitessio:mainfrom
tinyspeck:am_getworkflows_sema

ajm188 commented Jun 18, 2021

Uh oh!

setassociative left a comment •

edited

Loading

Uh oh!

setassociative Jun 18, 2021

Uh oh!

setassociative Jun 18, 2021

Uh oh!

setassociative commented Jun 18, 2021 •

edited

Loading

Uh oh!

ajm188 commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		vExecPoolSize = flag.Uint("workflow_server_vexec_pool_size", 1000, "maximum number of concurrent vexec queries to allow")
		vExecPoolTimeout = flag.Duration("workflow_server_vexec_pool_default_timeout", time.Millisecond*50, "default timeout to wait acquiring a connection from the vexec pool. zero implies no timeout")

Conversation

ajm188 commented Jun 18, 2021

Description

Related Issue(s)

Checklist

Deployment Notes

Uh oh!

setassociative left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

setassociative Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

setassociative Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

setassociative commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajm188 commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

setassociative left a comment •

edited

Loading

setassociative commented Jun 18, 2021 •

edited

Loading