go/vt/wrangler: reduce VReplicationExec calls when getting copy state by maxenglander · Pull Request #14375 · vitessio/vitess

maxenglander · 2023-10-26T12:52:30Z

Description

During MoveTables SwitchTraffic, there is a phase where wrangler queries the copystate of each stream. Currently it does this by making an individual VReplicationExec call for each stream.

This can be prohibitively time-consuming for workflows with very large # of streams. For example, a MoveTables workflow where the source and target keyspace have 128 shards, and where the target keyspace has different primary vindex than the source keyspace, will end up with 16384 VStreams. Even if each individual VReplicationExec takes only a few milliseconds, in aggregate this could easily take 30+ seconds, risking timing out the SwitchTraffic action.

This PR makes things a bit more efficient by making 1 VReplicationExec per target shard, and getting all relevant copy states with each call. In my testing with large # of VStreams, this makes the overall SwitchTraffic action much faster for the use case I described above. The trade-off here is that the these VReplicationExec queries are more expensive with larger result sets.

Related Issue(s)

Fixes #14325

Checklist

"Backport to:" labels are not needed
Tests were added or are not required
Did the new or modified tests pass consistently locally and on the CI
Documentation ~~was added or~~ is not required

Signed-off-by: Max Englander <max@planetscale.com>

vitess-bot · 2023-10-26T12:52:33Z

mattlord

Thank you, @maxenglander ! This is a nice little optimization.

I had some minor suggestions. Beyond that, we'll need to make the equivalent optimization for vtctldclient as well. vtctlclient — which is going away soon — uses wrangler whereas vtctldclient uses the workflow server. So we'd make the same optimization here:

vitess/go/vt/vtctl/workflow/server.go

Lines 1053 to 1083 in 2f56827

    
           func (s *Server) getWorkflowCopyStates(ctx context.Context, tablet *topo.TabletInfo, id int64) ([]*vtctldatapb.Workflow_Stream_CopyState, error) { 
        
           	span, ctx := trace.NewSpan(ctx, "workflow.Server.getWorkflowCopyStates") 
        
           	defer span.Finish() 
        
           	span.Annotate("keyspace", tablet.Keyspace) 
        
           	span.Annotate("shard", tablet.Shard) 
        
           	span.Annotate("tablet_alias", tablet.AliasString()) 
        
           	span.Annotate("vrepl_id", id) 
        
           	query := fmt.Sprintf("select table_name, lastpk from _vt.copy_state where vrepl_id = %d and id in (select max(id) from _vt.copy_state where vrepl_id = %d group by vrepl_id, table_name)", id, id) 
        
           	qr, err := s.tmc.VReplicationExec(ctx, tablet.Tablet, query) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	result := sqltypes.Proto3ToResult(qr) 
        
           	if result == nil { 
        
           		return nil, nil 
        
           	} 
        
           	copyStates := make([]*vtctldatapb.Workflow_Stream_CopyState, len(result.Rows)) 
        
           	for i, row := range result.Rows { 
        
           		// These fields are technically varbinary, but this is close enough. 
        
           		copyStates[i] = &vtctldatapb.Workflow_Stream_CopyState{ 
        
           			Table:  row[0].ToString(), 
        
           			LastPk: row[1].ToString(), 
        
           		} 
        
           	} 
        
           	return copyStates, nil 
        
           }

go/vt/wrangler/vexec.go

Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander · 2023-10-30T09:30:30Z

@mattlord I can add more tests if you think it's needed, but I think the code changes are in a decent place. I decided not to try the JOIN approach we chatted about in Slack.

Signed-off-by: Max Englander <max@planetscale.com>

mattlord · 2023-12-04T19:07:41Z

My concern is that GetWorkflows is used quite heavily in vtctldclient (the client going forward). This work makes that significantly heavier for 99.9999% of cases in order to improve the 0.00001% of cases (high number of shards and changing vindexes during the move). It's already relatively heavy and gets called fairly often.

Did you get a sense of how much slower this made the average workflow show or moveables show commands?

maxenglander · 2023-12-04T19:48:47Z

Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards.

Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 getCopyState calls under the current implementation.

We were seeing that the overall time to complete this block of code on SwitchTraffic was taking between 40-80s by itself.

The fact that it was taking so long resulted in various timeouts:

remote operation timeout
etcd topo lease timeout

...as well as this error:

cannot switch traffic for workflow import_workflow at this time: replication lag 61s is higher than allowed lag 30s

This work makes that significantly heavier

Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls.

Signed-off-by: Max Englander <max@planetscale.com>

mattlord · 2023-12-05T22:46:37Z

Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards.

Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 getCopyState calls under the current implementation.

I understand. My point was only that to my knowledge this is an exceedingly rare use case in the history of Vitess. I'm not saying that it's an invalid one. My point was that we should not make things worse/slower for the typical case in order to improve this one. That was my concern. It's a matter of HOW we address that use case/issue, not IF.

We were seeing that the overall time to complete this block of code on SwitchTraffic was taking between 40-80s by itself.

The fact that it was taking so long resulted in various timeouts:

remote operation timeout

etcd topo lease timeout

We may have been overloading various resources like the topo server which has a cascading effect. As we process each result we also make a topo call:

si, err := wr.ts.GetShard(ctx, keyspace, primary.Shard)

And those results are processed serially. So if the topo responses are a little slow, that will cause the total time to climb a lot in this particular case. We could process those results concurrently as well, synchronizing on the actual updates to the map (but most importantly making those topo calls in parallel).

...as well as this error:
cannot switch traffic for workflow import_workflow at this time: replication lag 61s is higher than allowed lag 30s
This work makes that significantly heavier

Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls.

You noted the trade-off yourself in the PR description.

The trade-off here is that the these VReplicationExec queries are more expensive with larger result sets.

The main query goes from being a point select to a range query. And this can have an impact when subsequently getting the log records for the stream(s) etc as well. I made this more efficient here: #14212 It's still a general concern of mine going forward though. So I'm a little (overly) paranoid about it as a lot of vreplication in v18+ is potentially impacted.

All that being said, in general I think we are offsetting any additional cost here by batching things like getting the copy states (although in most cases there will only be one stream on a tablet, but still) so this may be a wash in the end or even improve the typical case. So let me just review in full again. 😄

mattlord

I'm sorry again for the delay. I think this looks OK, but we don't seem to have any test coverage do we? It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.

Can we add some? Or let me know if I just missed it. I'm talking about unit tests here as we do have some coverage in the endtoend tests as there are cases where there's N streams per tablet (e.g. shard merges).

go/vt/vtctl/workflow/server.go

go/vt/wrangler/vexec.go

go/vt/vtctl/workflow/server.go

maxenglander · 2023-12-05T23:35:54Z

Did you get a sense of how much slower this made the average workflow show or moveables show commands?

I realized that I completely misread this initially. I thought you were asking me how much slower the MoveTables commands were in the extreme case (14k shards). So my last comment completely missed the mark, sorry.

I did not get a sense of how much slower this was for the average case, although I tested it out locally a bunch with examples/local and did not notice any difference.

Can we add some?

Definitely. I was hoping to get a some initial feedback before investing in tests, which you've now given, and I appreciate! I'll get to work on tests 👷

Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander · 2023-12-19T20:06:38Z

It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.

I took a somewhat lazy approach and just updated the tests to unit tests to have two tables, and test from there.

mattlord

This looks good to me. Thanks, @maxenglander ! I only had some minor nits and suggestions. We can discuss/address those along with any that @rohit-nayak-ps may have. @rohit-nayak-ps can you please also review this whenever you have time?

go/vt/vtctl/workflow/server.go

go/vt/wrangler/vexec.go

Signed-off-by: Max Englander <max@planetscale.com>

mattlord

Thanks again, @maxenglander !

go/vt/wrangler: reduce VReplicationExec calls when getting copy state

cfc8b01

Signed-off-by: Max Englander <max@planetscale.com>

vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 26, 2023

maxenglander added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication and removed NeedsWebsiteDocsUpdate What it says labels Oct 26, 2023

github-actions bot added this to the v19.0.0 milestone Oct 26, 2023

maxenglander marked this pull request as ready for review October 26, 2023 13:21

maxenglander requested review from ajm188, deepthi, mattlord and rohit-nayak-ps as code owners October 26, 2023 13:21

maxenglander removed the NeedsIssue A linked issue is missing for this Pull Request label Oct 26, 2023

dbussink removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Oct 26, 2023

mattlord reviewed Oct 26, 2023

View reviewed changes

go/vt/wrangler/vexec.go Show resolved Hide resolved

go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved

maxenglander and others added 4 commits October 26, 2023 15:56

Update go/vt/wrangler/vexec.go

4e1e17c

Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>

update tests to work with last commit

88c1157

Signed-off-by: Max Englander <max@planetscale.com>

start working on making the same thing happen in workflow server

7bd38c4

Signed-off-by: Max Englander <max@planetscale.com>

update tests

d100a77

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander requested review from GuptaManan100, harshit-gangal, notfelineit and shlomi-noach as code owners October 27, 2023 18:14

maxenglander added 2 commits October 27, 2023 19:16

rm notes.txt :face-palm:

e48bc6d

Signed-off-by: Max Englander <max@planetscale.com>

dont fetch copy state if no ids

b0bfaef

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander requested a review from mattlord October 30, 2023 09:29

merge <- main

5ce0461

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander self-assigned this Nov 10, 2023

maxenglander added 2 commits December 4, 2023 15:56

merge <- main

34ee205

Signed-off-by: Max Englander <max@planetscale.com>

merge <- main

ce4acf9

Signed-off-by: Max Englander <max@planetscale.com>

mattlord reviewed Dec 5, 2023

View reviewed changes

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved

go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved

go/vt/vtctl/workflow/server.go Show resolved Hide resolved

maxenglander added the Skip CI Skip CI actions from running label Dec 19, 2023

maxenglander and others added 5 commits December 19, 2023 10:54

Update go/vt/vtctl/workflow/server.go

2701fca

Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>

Update go/vt/wrangler/vexec.go

ba11b6a

Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>

merge <- main

88c5345

Signed-off-by: Max Englander <max@planetscale.com>

fix bad make(map) call

bd914fc

Signed-off-by: Max Englander <max@planetscale.com>

cr: update tests to have multiple streams

937e6ff

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander removed the Skip CI Skip CI actions from running label Dec 19, 2023

trigger ci

e62ac3c

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander requested a review from mattlord December 19, 2023 20:06

mattlord reviewed Dec 27, 2023

View reviewed changes

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved

go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved

maxenglander added 4 commits December 27, 2023 14:57

merge <- main; cr: allocate once, use vterrors

fadcb44

Signed-off-by: Max Englander <max@planetscale.com>

cr: use errgroup

0b35e4e

Signed-off-by: Max Englander <max@planetscale.com>

Merge branch 'main' into maxeng-vexec-getcopystate

d9c9c34

Signed-off-by: Max Englander <max@planetscale.com>

fix slice allocation err

32a71f4

Signed-off-by: Max Englander <max@planetscale.com>

maxenglander requested a review from mattlord December 28, 2023 16:12

mattlord approved these changes Dec 28, 2023

View reviewed changes

rohit-nayak-ps approved these changes Dec 28, 2023

View reviewed changes

rohit-nayak-ps merged commit 2783e32 into vitessio:main Dec 28, 2023

rohit-nayak-ps deleted the maxeng-vexec-getcopystate branch December 28, 2023 19:26

	func (s Server) getWorkflowCopyStates(ctx context.Context, tablet topo.TabletInfo, id int64) ([]*vtctldatapb.Workflow_Stream_CopyState, error) {
	span, ctx := trace.NewSpan(ctx, "workflow.Server.getWorkflowCopyStates")
	defer span.Finish()

	span.Annotate("keyspace", tablet.Keyspace)
	span.Annotate("shard", tablet.Shard)
	span.Annotate("tablet_alias", tablet.AliasString())
	span.Annotate("vrepl_id", id)

	query := fmt.Sprintf("select table_name, lastpk from _vt.copy_state where vrepl_id = %d and id in (select max(id) from _vt.copy_state where vrepl_id = %d group by vrepl_id, table_name)", id, id)
	qr, err := s.tmc.VReplicationExec(ctx, tablet.Tablet, query)
	if err != nil {
	return nil, err
	}

	result := sqltypes.Proto3ToResult(qr)
	if result == nil {
	return nil, nil
	}

	copyStates := make([]*vtctldatapb.Workflow_Stream_CopyState, len(result.Rows))
	for i, row := range result.Rows {
	// These fields are technically varbinary, but this is close enough.
	copyStates[i] = &vtctldatapb.Workflow_Stream_CopyState{
	Table: row[0].ToString(),
	LastPk: row[1].ToString(),
	}
	}

	return copyStates, nil
	}

Conversation

maxenglander commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Checklist

Uh oh!

vitess-bot bot commented Oct 26, 2023

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maxenglander commented Oct 30, 2023

Uh oh!

mattlord commented Dec 4, 2023

Uh oh!

maxenglander commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattlord commented Dec 5, 2023

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxenglander commented Dec 5, 2023

Uh oh!

maxenglander commented Dec 19, 2023

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maxenglander commented Oct 26, 2023 •

edited

Loading

maxenglander commented Dec 4, 2023 •

edited

Loading