go/vt/wrangler: reduce VReplicationExec calls when getting copy state#14375
go/vt/wrangler: reduce VReplicationExec calls when getting copy state#14375rohit-nayak-ps merged 20 commits intovitessio:mainfrom
Conversation
Signed-off-by: Max Englander <max@planetscale.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
mattlord
left a comment
There was a problem hiding this comment.
Thank you, @maxenglander ! This is a nice little optimization.
I had some minor suggestions. Beyond that, we'll need to make the equivalent optimization for vtctldclient as well. vtctlclient — which is going away soon — uses wrangler whereas vtctldclient uses the workflow server. So we'd make the same optimization here:
vitess/go/vt/vtctl/workflow/server.go
Lines 1053 to 1083 in 2f56827
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
|
@mattlord I can add more tests if you think it's needed, but I think the code changes are in a decent place. I decided not to try the |
Signed-off-by: Max Englander <max@planetscale.com>
|
My concern is that Did you get a sense of how much slower this made the average |
|
Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards. Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 We were seeing that the overall time to complete this block of code on The fact that it was taking so long resulted in various timeouts:
...as well as this error:
Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls. |
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
I understand. My point was only that to my knowledge this is an exceedingly rare use case in the history of Vitess. I'm not saying that it's an invalid one. My point was that we should not make things worse/slower for the typical case in order to improve this one. That was my concern. It's a matter of HOW we address that use case/issue, not IF.
We may have been overloading various resources like the topo server which has a cascading effect. As we process each result we also make a topo call: And those results are processed serially. So if the topo responses are a little slow, that will cause the total time to climb a lot in this particular case. We could process those results concurrently as well, synchronizing on the actual updates to the map (but most importantly making those topo calls in parallel).
You noted the trade-off yourself in the PR description.
The main query goes from being a point select to a range query. And this can have an impact when subsequently getting the log records for the stream(s) etc as well. I made this more efficient here: #14212 It's still a general concern of mine going forward though. So I'm a little (overly) paranoid about it as a lot of vreplication in v18+ is potentially impacted. All that being said, in general I think we are offsetting any additional cost here by batching things like getting the copy states (although in most cases there will only be one stream on a tablet, but still) so this may be a wash in the end or even improve the typical case. So let me just review in full again. 😄 |
mattlord
left a comment
There was a problem hiding this comment.
I'm sorry again for the delay. I think this looks OK, but we don't seem to have any test coverage do we? It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.
Can we add some? Or let me know if I just missed it. I'm talking about unit tests here as we do have some coverage in the endtoend tests as there are cases where there's N streams per tablet (e.g. shard merges).
I realized that I completely misread this initially. I thought you were asking me how much slower the I did not get a sense of how much slower this was for the average case, although I tested it out locally a bunch with
Definitely. I was hoping to get a some initial feedback before investing in tests, which you've now given, and I appreciate! I'll get to work on tests 👷 |
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
I took a somewhat lazy approach and just updated the tests to unit tests to have two tables, and test from there. |
mattlord
left a comment
There was a problem hiding this comment.
This looks good to me. Thanks, @maxenglander ! I only had some minor nits and suggestions. We can discuss/address those along with any that @rohit-nayak-ps may have. @rohit-nayak-ps can you please also review this whenever you have time?
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
mattlord
left a comment
There was a problem hiding this comment.
Thanks again, @maxenglander !
Description
During
MoveTables SwitchTraffic, there is a phase where wrangler queries thecopystateof each stream. Currently it does this by making an individualVReplicationExeccall for each stream.This can be prohibitively time-consuming for workflows with very large # of streams. For example, a
MoveTablesworkflow where the source and target keyspace have 128 shards, and where the target keyspace has different primary vindex than the source keyspace, will end up with 16384 VStreams. Even if each individualVReplicationExectakes only a few milliseconds, in aggregate this could easily take 30+ seconds, risking timing out theSwitchTrafficaction.This PR makes things a bit more efficient by making 1
VReplicationExecper target shard, and getting all relevant copy states with each call. In my testing with large # of VStreams, this makes the overallSwitchTrafficaction much faster for the use case I described above. The trade-off here is that the theseVReplicationExecqueries are more expensive with larger result sets.Related Issue(s)
Fixes #14325
Checklist
was added oris not required