VTGR: Vitess + MySQL group replication#8387
Conversation
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
| go func() { c <- shard.tmc.Ping(pingCtx, instance.tablet) }() | ||
| select { | ||
| case <-pingCtx.Done(): | ||
| log.Errorf("Ping abort timeout %v", *pingTabletTimeout) | ||
| return false | ||
| case err := <-c: | ||
| if err != nil { | ||
| log.Errorf("Ping error host=%v: %v", instance.instanceKey.Hostname, err) | ||
| } | ||
| return err == nil | ||
| } |
There was a problem hiding this comment.
This is bad but it's actually our fault. Once #8368 is merged we should be able to fix the TMC API so that the initial dial is context-aware and we don't need this hack.
There was a problem hiding this comment.
I don't think switching to DialContext there will actually fix this issue, though. From the docs
By default, it's a non-blocking dial (the function won't wait for connections to be established, and connecting happens in the background). To make it a blocking dial, use WithBlock() dial option.
In the non-blocking case, the ctx does not act against the connection. It only controls the setup steps.
I don't think we should unilaterally use WithBlock as that will result in slowdowns in the rest of the tmc use-cases, so what I think we should do is a follow-up refactor to the tmc api (and merge both #8368 (pending review / further changes) and this PR as-is) to allow callers to specify dial options to use, something like:
tmc := tmclient.TabletManagerClient().WithDialOptions(grpc.WithBlock())There was a problem hiding this comment.
I think this sounds about right: the comment that @narcsfz added to this method was a bit confusing, because it complains about dial/retry timeouts when (in theory) we shouldn't be seeing dial timeouts at all, since this API is not called with WithBlock and hence the actual connection will only happen during the actual call to Ping, and will be context-aware. I think what @narcsfz was seeing are the basic setup steps which by default are not controlled by context but will be if we switch to DialContext.
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
|
Addressed your comments ptal @systay |
deepthi
left a comment
There was a problem hiding this comment.
Well-written and well-documented contribution.
These comments are from a first pass. I will be doing another pass over the PR.
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
|
There is a test failure after a lot of retries. Looks like it's related to some network issue: https://github.com/vitessio/vitess/pull/8387/checks?check_run_id=3023339855 I will try to rerun it later, but it should not affect the review cc @deepthi |
deepthi
left a comment
There was a problem hiding this comment.
Mostly LGTM. We can merge once feedback has been addressed and tests are passing.
Signed-off-by: crowu <y.wu4515@gmail.com>
Signed-off-by: crowu <y.wu4515@gmail.com>
Description
This adds single primary mysql group replication support for Vitess. Specifically it does the followings:
The structure of the PR is:
Related Issue(s)
#8386
Checklist
Deployment Notes