Dedicated poolDialer logic for VTOrc, throttler#15562
Dedicated poolDialer logic for VTOrc, throttler#15562shlomi-noach merged 9 commits intovitessio:mainfrom
Conversation
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Added some unit testing. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15562 +/- ##
==========================================
+ Coverage 65.73% 65.77% +0.03%
==========================================
Files 1560 1561 +1
Lines 194595 194755 +160
==========================================
+ Hits 127921 128102 +181
+ Misses 66674 66653 -21 ☔ View full report in Codecov by Sentry. |
dbussink
left a comment
There was a problem hiding this comment.
Let's go for this one yeah, and see separately how we can fix up those other cases. I suspect we're probably good enough for a single multiplexed connection in the end anyway?
I believe so. Certainly for the throttler and for vtorc. For the rest, more experimentation needed. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
GuptaManan100
left a comment
There was a problem hiding this comment.
I like this implementation!
| invalidator := func(err error) { | ||
| client.mu.Lock() | ||
| defer client.mu.Unlock() | ||
| m[addr].cc.Close() | ||
| delete(m, addr) | ||
| } |
There was a problem hiding this comment.
Just one question. We take in the error here, but don't check for the error to be a connection error, or even if it's nil. Won't this cause us to invalidate the connections after every call?
There was a problem hiding this comment.
we call invalidator only when there is an error, passing of error should be removed
There was a problem hiding this comment.
Good catch on not checking for err == nil! Updated.
Re: connection error, we've already established last week that it's very difficult to distinguish between gRPC errors, TCP errors, vitess errors, MySQL errors etc., because the call stack is such that errors are wrapped up as SQLErr including gRPC ones. We therefore invalidate on any error.
Mind you that in this PR we only create an invalidation function for FullStatus and for CheckThrottler, where SQL errors are highly unexpected, as opposed to ExecuteFetch* where such errors are trivial.
There was a problem hiding this comment.
I further updated to reflect @harshit-gangal 's comment. Calling invalidator() only on error, and now not even passing the error argument.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
| if dialPoolGroup != DialPoolGroupDefault { | ||
| client.mu.Lock() | ||
| defer client.mu.Unlock() | ||
| if client.rpcDialPoolMap == nil { | ||
| client.rpcDialPoolMap = make(map[DialPoolGroup]addrTmcMap) | ||
| } |
There was a problem hiding this comment.
nit: default and non-default looks like a different code path,
can this be changed to a switch condition and separate method calls for readability?
There was a problem hiding this comment.
nit: default and non-default looks like a different code path,
I've now split into dialPool (near identical the original function) and dialDedicatedPool (new behavior). Can you please take another look?
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
| var c tabletmanagerservicepb.TabletManagerClient | ||
| var invalidator invalidatorFunc | ||
| var err error | ||
| if poolDialer, ok := client.dialer.(poolDialer); ok { |
There was a problem hiding this comment.
I wonder in what scenario is the dialer not a poolDialer?
| if invalidator != nil { | ||
| invalidator() | ||
| } | ||
| return nil, err |
There was a problem hiding this comment.
nit: this will not be nil
There was a problem hiding this comment.
It could be nil, if this condition fails:
if poolDialer, ok := client.dialer.(poolDialer); okNow, I'm not sure how it could possibly fail, but as golang goes, I created var invalidator invalidatorFunc and it could be nil.
There was a problem hiding this comment.
If you like, I can pre-assign var invalidator to an empty function? That way it will never be nil.
There was a problem hiding this comment.
This kinda returns to this comment: #14979 (comment)
|
|
||
| response, err := c.CheckThrottler(ctx, req) | ||
| if err != nil { | ||
| if invalidator != nil { |
|
Merging for now. I think we still need a refactor to the pooDialer code, and that's where we can address casting, nil, naming, etc. issues. |
|
This is a connection leak bugfix, and as such, should be backported to v19. |
… (#15567) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Description
Did I say #15560 was the final alternative? This is final 2.0. So #15560 ran into a couple problems, seemingly locking. I'm investing far too much time in this problem, so this PR, as per @harshit-gangal 's suggestion, keeps the existing pools completely untouched, and adds a new mechanism for the non-standard pool dialers (ie
vtorcandthrottler). These two non-standard dialers only have a singletmchence a single cached connection. For any error whatsoever received inFullStatusor inCheckThrottler, the connection gets invalidated (it's too aggressive, but better than existing situation).Related Issue(s)
poolDialer#15563vtorcand to throttler functions, no change in "concurrency" #15560, WIP: dedicated pool dialers tovtorcand to throttler functions, removing "concurrency" #15559, WIP:TabletManagerClientreal pool dialer #15541, Fix for FullStatus gRPC connection pooling #15520Checklist
Deployment Notes