Skip to content

VTShovel - VReplication support for external databases#5289

Merged
sougou merged 33 commits intovitessio:masterfrom
tinyspeck:vtshovel-poc
Dec 5, 2019
Merged

VTShovel - VReplication support for external databases#5289
sougou merged 33 commits intovitessio:masterfrom
tinyspeck:vtshovel-poc

Conversation

@rafael
Copy link
Copy Markdown
Member

@rafael rafael commented Oct 10, 2019

Description

Have you ever wanted to leverage the powers of vreplication outside the environment of Vitess? Do you dream about copying bytes? The following PR will have a solution for you.

Introducing: vtshovel . A flexible tool that allows you to create vreplication streams directly from mysql instances outside of the vitess ecosystem.

To give a bit of context about the motivation for this tool, we (Slack) are in the process of migrating entire databases from our legacy mysql clusters to Vitess. We plan to leverage this tool to help us get in sync mysql instances from our legacy clusters to their Vitess counterparts.

We are thinking that other folks might find useful to have a tool like this when doing migrations.

Core Design

  • The core design for this feature is to leverage all the vreplication framework. It addition to vttabletss, vreplication streams can now point to external databases. This is done by introducing an abstraction that implements vstreamer methods:
    // VStreamerClient exposes the core interface of a vstreamer
    type VStreamerClient interface {
        // Open sets up all the environment for a vstream
        Open(ctx context.Context) error
        // Close closes a vstream
        Close(ctx context.Context) error
             // VStream streams VReplication events based on the specified filter.
            VStream(ctx context.Context, startPos string, filter *binlogdatapb.Filter, send func([]*binlogdatapb.VEvent) error) error
            // VStreamRows streams rows of a table from the specified starting point.
            VStreamRows(ctx context.Context, query string, lastpk *querypb.QueryResult, send func(*binlogdatapb.VStreamRowsResponse) error) error
     }
    
  • Depending on the configuration of the vreplication, a vplayer will choose between a TabletVStreamerClient and a NewMySQLVStreamerClient.
  • There is some technical debt introduced in the way we are choosing credentials for the external mysql. At the moment we don't have a good way to do that. To not increase the scope of this PR we added erepl to dbconfigs.
  • I added good test coverage to vstreamer_client and also an integration test for the vplayer.

Additional changes

  • This PR also adds supports for statement based replication. Binlonplayer can understand both statement and row based replication. Certain types of filters won't be supported in statement based and the stream will fail in such cases. At the moment match all rules will be supported for statement based replication streams.

  • It also added support to stream without using gtids. This was done cutting some corners, but it will be cleaned up soon. @sougou and I are working on that.

Rafael Chacon added 7 commits October 3, 2019 15:21
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* Adds support for VStream to start from filename:pos and not gtid sets.
* Adds support for statement based replication streams (this should only be used
  in the context of mysql streamer, it is not safe for tablet vreplicaiton).
* Adds support to run vstream from mysql directly

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael rafael requested a review from sougou as a code owner October 10, 2019 20:51
* Adds binary to run vtshovel.
* At the moment only working in ephemeral mode (i.e no data is persisted back to
  vrsettings).
* vtshovel only works for statement based replication right now. This is due to
  now having a good way to have a schema loader. We will itereate on this.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Rafael Chacon added 5 commits October 16, 2019 16:33
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* This will be removed in future PR. Adding while in POC

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Copy link
Copy Markdown
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach very nice overall. A few minor nits.


// NewMySQLVStreamerClient is a vstream client that allows you to stream directly from MySQL.
// In order to achieve this, the following creates a vstreamer Engine with a dummy in memorytopo.
func NewMySQLVStreamerClient(sourceConnParams *mysql.ConnParams) *MySQLVStreamerClient {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this function can pull the dbconfigs based on the external repl user name. Then you don't have to pass it through to vreplication.Engine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that the end goal is to be able to point to any external DB, having a this parameter here will make it more flexible.

I think we shouldn't rely heavily in the repl username as we would like to refactor that soon.

What do you think?

Rafael Chacon added 4 commits October 29, 2019 09:14
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael rafael changed the title WIP - VTShovel POC VTShovel - VReplication support for external databases Nov 6, 2019
Rafael Chacon added 3 commits November 6, 2019 12:56
* At the moment we only support erpel user. Passing source conn params around
was adding unnecessary complexity.
* This cleans up that and makes it more explicit that only erepl user is
supported. In the future we will add more flexibility in terms of what kind of
users can be configured for external vreplication streams

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* Fix typo in some comments.
* Make VReplicator private again. This change is no longer needed. Originally we
wanted "vtshovel" to be an external process. Given that this now hooks into the
existent engine, there is no need to make this public.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Rafael Chacon added 7 commits November 26, 2019 16:08
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* StripChecksum was changing the type of the event. This was a bug.
* Adds test to vstreamer to reflect new support for statement based replication

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Copy link
Copy Markdown
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Couple of comments.

math "math"

proto "github.com/golang/protobuf/proto"
math "math"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a conflict between grpc code gen and goimports. If you re-run goimports, all these files will revert to unchanged. Or, you can just manually revert these yourself.

Rafael Chacon added 2 commits December 4, 2019 14:45
* Compute canAcceptStmtEvents when creating vplayer.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael
Copy link
Copy Markdown
Member Author

rafael commented Nov 11, 2020

Hi @jawabuu, the way this code ended up landing, is not a separate binary. It is intended to be run as part of vttablet. The way it works is that is possible to have VReplication streams where the source is external.

@rafael
Copy link
Copy Markdown
Member Author

rafael commented Nov 11, 2020

There is some discussion about how this is used in our Slack community: https://vitess.slack.com/archives/C0PQY0PTK/p1604989571062700?thread_ts=1579649445.062400&cid=C0PQY0PTK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants