Skip to content

vreplication: copy functionality#4791

Merged
sougou merged 21 commits intovitessio:masterfrom
planetscale:ss-vrepl
Apr 21, 2019
Merged

vreplication: copy functionality#4791
sougou merged 21 commits intovitessio:masterfrom
planetscale:ss-vrepl

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Apr 7, 2019

This PR is harder to break up into smaller commits. So, I'll just explain in detail what this does. The high level approach is described in #4604.

  • QueryService adds a new function called VStreamRows. This is the source side function that streams all rows of a table. It does the following:
    • Lock the table, obtain gtid position.
    • Send initial packet: fields, pk fields and gtid. In this repeatable read mode, the snapshot of the rows will be as of the gtid.
    • Unlock the table.
  • vplayer (on the target side) is split into three parts:
    • vreplicator: is the high level object that orchestrates the copy and transition into replication.
    • vcopier: goes back and forth between copying and replicating, until all copy is done.
    • vplayer: the replication module, also used by the copier to catch up.
  • PlayerPlan renamed to ReplicatorPlan. This plan is used by the copier and player. The plan can be shared because of the uniformity of the vreplication model. We send the same query for both copying and replicating. The only difference is that the copier applies the rows as if they were inserts, but also in bulk.
  • TablePlan has to generate additional or different queries for applying:
    • A three-part insert for generating the bulk insert statements for copying.
    • Add a lastpk member. This contains the values of the last pk columns copied so far, in case of a partial table copy.
    • The other Insert, Update and Delete statements are generated differently if lastpk is set. We have to make sure we don't apply changes from rows that haven't been copied yet. This strategy is described in the RFC: VReplication based SplitClone #4604.

Copy link
Copy Markdown
Collaborator

@dweitzman dweitzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading with lots more to go, but sending a few minor comments on some of the files I've looked at so far.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in this function, could be nice to assign qr.Rows[0] to a local variable like vrRow or somesuch after checking that qr.RowsAffected == 1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly off topic: would be nice at some point to do the callerid stuff and HandlePanic in some kind of grpc interceptor instead of manually copying it for every function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I haven't been a fan of this pattern anyways.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like vterrors.ToGRPC(nil) would do the right thing anyway without an explicit nil check here. I'll leave you to decide whether that makes the code more clear, equally clear and shorter, or less clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Fixed in a few more places also.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine an argument for returning ctx.Err() here. If it's intentional that this method suppressing a canceled or deadline exceeded, seems like something to be clear about in docs

Alternatively (and maybe simpler?) it could return ctx.Err() and the caller could be responsible for interpreting any cancelation error or timeout error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. This function must return error on cancel. Otherwise the caller may think that they received all the rows.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my taste, at least, adding comments to a bunch of the functions in this file could make it easier to grok for people without previous context on what's going on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in a few plances. Let me know if any other place needs clarification.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of "a_" vs "b_" bind variable names for the before vs after values is slick, but wasn't immediately clear to me. That might be something to mention explicitly in the comment for TablePlan

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: it's not immediately obvious to me why a table plan can't assume Delete is present but can assume that Insert is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to TablePlan that explains why.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might help to call these BulkInsertFront and BulkInsertValues for clarity. I think I have another comment that suggests adding an overall comment for this struct with more explanation about the bind variables and such.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use InsertFront and InsertValues so that a separate Insert wouldn't need to exist?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert is different from the three-part insert:
The three-part insert is used by the copy operation for generating the bulk insert statement, and does not have conditionals.
The individual Insert is used by the player for replicating rows. It will have the pk < :pk conditional in the insert.
After the copy is finished, the player's insert reverts to the non-conditional insert, just like the three-part insert would. However, the three-part insert becomes obsolete by then.

So, for readability, I decided not to overload these variables. Copy always uses the three-part insert, and player always uses the single Insert.

Copy link
Copy Markdown
Collaborator

@dweitzman dweitzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading. A few more minor comments

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random observation: although these db queries are likely to be fast in practice, someday it would be nice if the context propagated to help with timing out / aborting vreplication-related db access.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this channel read potentially block if the vplayer has already sent something? Should this be:

select {
  case <-errch:
  default:
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks right to me:

  • Upon termination, vplayer is guaranteed to send one and only one message on the channel.
  • There are two places (here and a few lines below) where we fetch from this channel. So, one of those is guaranteed to receive.

However, I see that we return without waiting on errch if we detect that the context was canceled. I think it's safer to wait; If vplayer gets stuck somewhere, the next invocation will race with it.

I'll make this change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous about whether there could be some error case where SecondsBehindMaster gets set to 0 and vplayer keeps erroring out without updating SecondsBehindMaster, which could appear to this code as if vplayer ran successfully and there's no delay.

The legacy binlog player can do that kind of thing (failing to update seconds behind master when it's in an error state)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about this also, which is why I did this PR first, to track this more accurately: #4637.
But it's something we have to keep an eye one. There could be other corner cases.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this explicitly verify that ctx.Err() is DeadlineExceeded before suppressing the error?

Copy link
Copy Markdown
Collaborator

@dweitzman dweitzman Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear to me as a reader what factors influence the number of rows that might be included here / the size of the query / the size of the transaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this comment:

		// The number of rows we receive depends on the packet size set
		// for the row streamer. Since the packet size is roughly equivalent
		// to data size, this should map to a uniform amount of pages affected
		// per statement. A packet size of 30K will roughly translate to 8
		// mysql pages of 4K each.

sougou added 20 commits April 17, 2019 11:04
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This refactor changes the insert format into the traditional
'values' syntax, which is reusable for bulk inserts during
copying.

We also generate a broken-up version of the insert to allow
for those sub-parts to be composed into a bulk-insert statement.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Also, Error state is obsolete. So, we're not using it any more.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
return player.ApplyBinlogEvents(ctx)
case ct.source.Filter != nil:
// VPlayer requires the timezone to be UTC.
// vreplicator requires the timezone to be UTC.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is so that we have a uniform way of measuring lag? If so, can you make that explicit in the comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because timestamps are always sent as UTC. Added a comment.

return tplan, nil
}

// buildFromFiedls builds a full TablePlan, but uses the field info as the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - buildFromFields

}
plan.TargetTables[tableName] = tablePlan
plan.TablePlans[tableName] = tablePlan
continue nextTable
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not having this line of code a bug/inefficiency in the previous version of code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a minor mismatch from the spec I had in mind. Without the continue, the last rule would have matched. Instead, I wanted to stop at the first match. We should clarify this when we write the spec.


func (tpb *tablePlanBuilder) generateUpdate(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter, before, after bool) {
func (tpb *tablePlanBuilder) generateOnDupPart(buf *sqlparser.TrackedBuffer) *sqlparser.ParsedQuery {
if tpb.onInsert != insertOndup {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this enum value be renamed to insertOnDup for readability?

case insertIgnore:
return nil
func (tpb *tablePlanBuilder) generateValuesPart(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter) *sqlparser.ParsedQuery {
bvf.mode = bvAfter
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here will be helpful, something to the effect that setting the mode will control how Myprintf format's the bind variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are set in multiple places, I've added a detailed comment on the bindvarFormatter type.

if before {
bvf.mode = bvBefore
buf.Myprintf("-ifnull(%v, 0)", cexpr.expr)
bvf.mode = bvBefore
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here? On updates we want to subtract the old value and add the new value to the sum.

type bindvarMode int

const (
bvNone = bindvarMode(iota)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing this.

if err != nil {
return err
}
if settings.StartPos.IsZero() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot catchup, must copy first?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification:

	// If there's no start position, it means we're copying the
	// first table. So, there's nothing to catch up to.

Copy link
Copy Markdown
Contributor Author

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed.

return player.ApplyBinlogEvents(ctx)
case ct.source.Filter != nil:
// VPlayer requires the timezone to be UTC.
// vreplicator requires the timezone to be UTC.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because timestamps are always sent as UTC. Added a comment.

}
plan.TargetTables[tableName] = tablePlan
plan.TablePlans[tableName] = tablePlan
continue nextTable
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a minor mismatch from the spec I had in mind. Without the continue, the last rule would have matched. Instead, I wanted to stop at the first match. We should clarify this when we write the spec.

case insertIgnore:
return nil
func (tpb *tablePlanBuilder) generateValuesPart(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter) *sqlparser.ParsedQuery {
bvf.mode = bvAfter
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are set in multiple places, I've added a detailed comment on the bindvarFormatter type.

if err != nil {
return err
}
if settings.StartPos.IsZero() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification:

	// If there's no start position, it means we're copying the
	// first table. So, there's nothing to catch up to.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Copy Markdown
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sougou sougou merged commit 6b34d20 into vitessio:master Apr 21, 2019
@sougou sougou deleted the ss-vrepl branch April 23, 2019 02:58
setassociative pushed a commit to tinyspeck/vitess that referenced this pull request Apr 26, 2019
It's been busy and exciting in vitess upstream. Some changes that felt worth calling out as they could impact assumptions about behavior:

- vitessio#4832 — this adds a timeout that is 3x the `healthCheckInterval` which at which point the tablet will stop reporting as healthy
- vitessio#4814 — this changes when command line args are used by vttablet
- vitessio#4811 — a planned reparent now sets `super_read_only`
- vitessio#4805 — behavior coming back from a failed backup is different

**Slack changes**  
        @demmer vitessio#4827 Correct suppress logging for begin...commit in autocommit
        @rafael vitessio#4824 adds timeouts for all statements
        @demmer vitessio#4826 adds logging of stack traces as opt-in
        @demmer vitessio#4819 remove begin/commit logs from autocommit txns
        @demmer vitessio#4796 improves support for vtexplain for begin/dml/dml/commit txns

**Non slack changes**  
        vitessio#4839 Improve behavior with reference table routing & vreplication
        vitessio#4833 Support query routing given there could be multiple targets for a table
        vitessio#4832 tablets get new health check behavior (health checks time out)
        vitessio#4837 refresh bug around row streamer handling creds
        vitessio#4830 apply a default value to the db_name attribute in local_metadata
        vitessio#4785 SHOW SCHEMAS aliased to SHOW DATABASES
        vitessio#4829 add experimental support for split clone & vertical split clone via vreplication
        vitessio#4822 do not normalize in order by
        vitessio#4791 Adds vreplication row streaming support
        vitessio#4814 command line flags now used regardless of management mode
        vitessio#4811 vttablet sets super_read_only during planned reparent
        vitessio#4803 xtrabackup testing
        vitessio#4727 support multiple vttablets running against one mysql instance
        vitessio#4746 TopoCat can now produce JSON output
        vitessio#4805 mysqld is now restarted after a failed backup
        vitessio#4685 macos bootstrap bug
        vitessio#4874 ZK opts and java land
        vitessio#4695 Adds support for xtrabackup
        vitessio#4794 fixes build failures
        vitessio#4725 changes mysql8.0 start args
        vitessio#4736 introduce new states to support vreplication
        vitessio#4788 expands orc error

**Docs etc**  
        vitessio#4831 lint
        vitessio#4827 docs
        vitessio#4816 const declaration cleanup
        vitessio#4820 const declaration cleanup
        vitessio#4825 docs
        vitessio#4818 docs
        vitessio#4809 docs
        vitessio#4812 moves consts around
        vitessio#4813 docs
        vitessio#4808 docs
        vitessio#4800 docs
        vitessio#4795 docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants