vreplication: copy functionality#4791
Conversation
dweitzman
left a comment
There was a problem hiding this comment.
Still reading with lots more to go, but sending a few minor comments on some of the files I've looked at so far.
There was a problem hiding this comment.
Nit: in this function, could be nice to assign qr.Rows[0] to a local variable like vrRow or somesuch after checking that qr.RowsAffected == 1
There was a problem hiding this comment.
Slightly off topic: would be nice at some point to do the callerid stuff and HandlePanic in some kind of grpc interceptor instead of manually copying it for every function.
There was a problem hiding this comment.
This is a good idea. I haven't been a fan of this pattern anyways.
There was a problem hiding this comment.
Looks like vterrors.ToGRPC(nil) would do the right thing anyway without an explicit nil check here. I'll leave you to decide whether that makes the code more clear, equally clear and shorter, or less clear.
There was a problem hiding this comment.
Done. Fixed in a few more places also.
There was a problem hiding this comment.
I can imagine an argument for returning ctx.Err() here. If it's intentional that this method suppressing a canceled or deadline exceeded, seems like something to be clear about in docs
Alternatively (and maybe simpler?) it could return ctx.Err() and the caller could be responsible for interpreting any cancelation error or timeout error.
There was a problem hiding this comment.
Nice catch. This function must return error on cancel. Otherwise the caller may think that they received all the rows.
There was a problem hiding this comment.
To my taste, at least, adding comments to a bunch of the functions in this file could make it easier to grok for people without previous context on what's going on.
There was a problem hiding this comment.
Added comments in a few plances. Let me know if any other place needs clarification.
There was a problem hiding this comment.
The use of "a_" vs "b_" bind variable names for the before vs after values is slick, but wasn't immediately clear to me. That might be something to mention explicitly in the comment for TablePlan
There was a problem hiding this comment.
Observation: it's not immediately obvious to me why a table plan can't assume Delete is present but can assume that Insert is.
There was a problem hiding this comment.
I've added a comment to TablePlan that explains why.
There was a problem hiding this comment.
Might help to call these BulkInsertFront and BulkInsertValues for clarity. I think I have another comment that suggests adding an overall comment for this struct with more explanation about the bind variables and such.
There was a problem hiding this comment.
Could this use InsertFront and InsertValues so that a separate Insert wouldn't need to exist?
There was a problem hiding this comment.
Insert is different from the three-part insert:
The three-part insert is used by the copy operation for generating the bulk insert statement, and does not have conditionals.
The individual Insert is used by the player for replicating rows. It will have the pk < :pk conditional in the insert.
After the copy is finished, the player's insert reverts to the non-conditional insert, just like the three-part insert would. However, the three-part insert becomes obsolete by then.
So, for readability, I decided not to overload these variables. Copy always uses the three-part insert, and player always uses the single Insert.
dweitzman
left a comment
There was a problem hiding this comment.
Still reading. A few more minor comments
There was a problem hiding this comment.
Random observation: although these db queries are likely to be fast in practice, someday it would be nice if the context propagated to help with timing out / aborting vreplication-related db access.
There was a problem hiding this comment.
Would this channel read potentially block if the vplayer has already sent something? Should this be:
select {
case <-errch:
default:
}
There was a problem hiding this comment.
This part looks right to me:
- Upon termination, vplayer is guaranteed to send one and only one message on the channel.
- There are two places (here and a few lines below) where we fetch from this channel. So, one of those is guaranteed to receive.
However, I see that we return without waiting on errch if we detect that the context was canceled. I think it's safer to wait; If vplayer gets stuck somewhere, the next invocation will race with it.
I'll make this change.
There was a problem hiding this comment.
I'm a bit nervous about whether there could be some error case where SecondsBehindMaster gets set to 0 and vplayer keeps erroring out without updating SecondsBehindMaster, which could appear to this code as if vplayer ran successfully and there's no delay.
The legacy binlog player can do that kind of thing (failing to update seconds behind master when it's in an error state)
There was a problem hiding this comment.
I was worried about this also, which is why I did this PR first, to track this more accurately: #4637.
But it's something we have to keep an eye one. There could be other corner cases.
There was a problem hiding this comment.
Should this explicitly verify that ctx.Err() is DeadlineExceeded before suppressing the error?
There was a problem hiding this comment.
It's not immediately clear to me as a reader what factors influence the number of rows that might be included here / the size of the query / the size of the transaction.
There was a problem hiding this comment.
Added this comment:
// The number of rows we receive depends on the packet size set
// for the row streamer. Since the packet size is roughly equivalent
// to data size, this should map to a uniform amount of pages affected
// per statement. A packet size of 30K will roughly translate to 8
// mysql pages of 4K each.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This refactor changes the insert format into the traditional 'values' syntax, which is reusable for bulk inserts during copying. We also generate a broken-up version of the insert to allow for those sub-parts to be composed into a bulk-insert statement. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Also, Error state is obsolete. So, we're not using it any more. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
| return player.ApplyBinlogEvents(ctx) | ||
| case ct.source.Filter != nil: | ||
| // VPlayer requires the timezone to be UTC. | ||
| // vreplicator requires the timezone to be UTC. |
There was a problem hiding this comment.
I assume this is so that we have a uniform way of measuring lag? If so, can you make that explicit in the comment?
There was a problem hiding this comment.
It's because timestamps are always sent as UTC. Added a comment.
| return tplan, nil | ||
| } | ||
|
|
||
| // buildFromFiedls builds a full TablePlan, but uses the field info as the |
| } | ||
| plan.TargetTables[tableName] = tablePlan | ||
| plan.TablePlans[tableName] = tablePlan | ||
| continue nextTable |
There was a problem hiding this comment.
Was not having this line of code a bug/inefficiency in the previous version of code?
There was a problem hiding this comment.
It was a minor mismatch from the spec I had in mind. Without the continue, the last rule would have matched. Instead, I wanted to stop at the first match. We should clarify this when we write the spec.
|
|
||
| func (tpb *tablePlanBuilder) generateUpdate(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter, before, after bool) { | ||
| func (tpb *tablePlanBuilder) generateOnDupPart(buf *sqlparser.TrackedBuffer) *sqlparser.ParsedQuery { | ||
| if tpb.onInsert != insertOndup { |
There was a problem hiding this comment.
can this enum value be renamed to insertOnDup for readability?
| case insertIgnore: | ||
| return nil | ||
| func (tpb *tablePlanBuilder) generateValuesPart(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter) *sqlparser.ParsedQuery { | ||
| bvf.mode = bvAfter |
There was a problem hiding this comment.
A comment here will be helpful, something to the effect that setting the mode will control how Myprintf format's the bind variable.
There was a problem hiding this comment.
Since these are set in multiple places, I've added a detailed comment on the bindvarFormatter type.
| if before { | ||
| bvf.mode = bvBefore | ||
| buf.Myprintf("-ifnull(%v, 0)", cexpr.expr) | ||
| bvf.mode = bvBefore |
There was a problem hiding this comment.
Comment here? On updates we want to subtract the old value and add the new value to the sum.
| type bindvarMode int | ||
|
|
||
| const ( | ||
| bvNone = bindvarMode(iota) |
| if err != nil { | ||
| return err | ||
| } | ||
| if settings.StartPos.IsZero() { |
There was a problem hiding this comment.
Cannot catchup, must copy first?
There was a problem hiding this comment.
Added clarification:
// If there's no start position, it means we're copying the
// first table. So, there's nothing to catch up to.
sougou
left a comment
There was a problem hiding this comment.
Comments addressed.
| return player.ApplyBinlogEvents(ctx) | ||
| case ct.source.Filter != nil: | ||
| // VPlayer requires the timezone to be UTC. | ||
| // vreplicator requires the timezone to be UTC. |
There was a problem hiding this comment.
It's because timestamps are always sent as UTC. Added a comment.
| } | ||
| plan.TargetTables[tableName] = tablePlan | ||
| plan.TablePlans[tableName] = tablePlan | ||
| continue nextTable |
There was a problem hiding this comment.
It was a minor mismatch from the spec I had in mind. Without the continue, the last rule would have matched. Instead, I wanted to stop at the first match. We should clarify this when we write the spec.
| case insertIgnore: | ||
| return nil | ||
| func (tpb *tablePlanBuilder) generateValuesPart(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter) *sqlparser.ParsedQuery { | ||
| bvf.mode = bvAfter |
There was a problem hiding this comment.
Since these are set in multiple places, I've added a detailed comment on the bindvarFormatter type.
| if err != nil { | ||
| return err | ||
| } | ||
| if settings.StartPos.IsZero() { |
There was a problem hiding this comment.
Added clarification:
// If there's no start position, it means we're copying the
// first table. So, there's nothing to catch up to.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
It's been busy and exciting in vitess upstream. Some changes that felt worth calling out as they could impact assumptions about behavior: - vitessio#4832 — this adds a timeout that is 3x the `healthCheckInterval` which at which point the tablet will stop reporting as healthy - vitessio#4814 — this changes when command line args are used by vttablet - vitessio#4811 — a planned reparent now sets `super_read_only` - vitessio#4805 — behavior coming back from a failed backup is different **Slack changes** @demmer vitessio#4827 Correct suppress logging for begin...commit in autocommit @rafael vitessio#4824 adds timeouts for all statements @demmer vitessio#4826 adds logging of stack traces as opt-in @demmer vitessio#4819 remove begin/commit logs from autocommit txns @demmer vitessio#4796 improves support for vtexplain for begin/dml/dml/commit txns **Non slack changes** vitessio#4839 Improve behavior with reference table routing & vreplication vitessio#4833 Support query routing given there could be multiple targets for a table vitessio#4832 tablets get new health check behavior (health checks time out) vitessio#4837 refresh bug around row streamer handling creds vitessio#4830 apply a default value to the db_name attribute in local_metadata vitessio#4785 SHOW SCHEMAS aliased to SHOW DATABASES vitessio#4829 add experimental support for split clone & vertical split clone via vreplication vitessio#4822 do not normalize in order by vitessio#4791 Adds vreplication row streaming support vitessio#4814 command line flags now used regardless of management mode vitessio#4811 vttablet sets super_read_only during planned reparent vitessio#4803 xtrabackup testing vitessio#4727 support multiple vttablets running against one mysql instance vitessio#4746 TopoCat can now produce JSON output vitessio#4805 mysqld is now restarted after a failed backup vitessio#4685 macos bootstrap bug vitessio#4874 ZK opts and java land vitessio#4695 Adds support for xtrabackup vitessio#4794 fixes build failures vitessio#4725 changes mysql8.0 start args vitessio#4736 introduce new states to support vreplication vitessio#4788 expands orc error **Docs etc** vitessio#4831 lint vitessio#4827 docs vitessio#4816 const declaration cleanup vitessio#4820 const declaration cleanup vitessio#4825 docs vitessio#4818 docs vitessio#4809 docs vitessio#4812 moves consts around vitessio#4813 docs vitessio#4808 docs vitessio#4800 docs vitessio#4795 docs
This PR is harder to break up into smaller commits. So, I'll just explain in detail what this does. The high level approach is described in #4604.
VStreamRows. This is the source side function that streams all rows of a table. It does the following:PlayerPlanrenamed to ReplicatorPlan. This plan is used by the copier and player. The plan can be shared because of the uniformity of the vreplication model. We send the same query for both copying and replicating. The only difference is that the copier applies the rows as if they were inserts, but also in bulk.TablePlanhas to generate additional or different queries for applying:lastpkmember. This contains the values of the last pk columns copied so far, in case of a partial table copy.lastpkis set. We have to make sure we don't apply changes from rows that haven't been copied yet. This strategy is described in the RFC: VReplication based SplitClone #4604.