Add a sema to bound the number of inflight vexec queries when getting workflows#8353
Add a sema to bound the number of inflight vexec queries when getting workflows#8353ajm188 wants to merge 1 commit intovitessio:mainfrom
Conversation
… workflows Signed-off-by: Andrew Mason <amason@slack-corp.com>
There was a problem hiding this comment.
Ideally we'd have a little more information around the concurrent query utilization for normal operations workflow before setting a limit. Basically - if this is a problem we should tune to control the problem; if it's not then why introduce it?
That said I believe this is safe. If there is a larger thing we're chasing let's try to get it into an issue.
No major objections to merging this as is though -- it seems like a good pattern for endpoints that can stand throttling. (🗒️ Implicitly: GetWorkflows is not in the hot path anywhere afaik, does that vibe with your understanding)
| vx := vexec.NewVExec(req.Keyspace, "", s.ts, s.tmc) | ||
|
|
||
| if !s.vexecPool.AcquireContext(ctx) { | ||
| return nil, ErrVExecConnTimeout | ||
| } |
There was a problem hiding this comment.
tioli:
func NewVExecThrottled(..., pool *sync2.Semaphore) *VExec
and then moving other vexec endpoints over to it is a tiiiiiiny bit less effort; also you don't have to remember to manage locking around each query
| vExecPoolSize = flag.Uint("workflow_server_vexec_pool_size", 1000, "maximum number of concurrent vexec queries to allow") | ||
| vExecPoolTimeout = flag.Duration("workflow_server_vexec_pool_default_timeout", time.Millisecond*50, "default timeout to wait acquiring a connection from the vexec pool. zero implies no timeout") |
There was a problem hiding this comment.
we should update release notes with this and if we expect to roll out limits per call (imo) we should have a better approach than a flag for concurrency-per-call
|
Edit: It was transient
|
|
#8368 is looking like a more promising (and holistic) fix. Going to close this |
Signed-off-by: Andrew Mason amason@slack-corp.com
Description
What it says in the title. Current defaults are somewhat arbitrary 1000 concurrent queries with a 50ms timeout to get a conn.
Related Issue(s)
Checklist
Deployment Notes