Skip to content

VSCopy: Resume the copy phase consistently from given GTID and lastpk#11103

Merged
mattlord merged 20 commits intovitessio:mainfrom
yoheimuta:vscopy-restart
Nov 10, 2022
Merged

VSCopy: Resume the copy phase consistently from given GTID and lastpk#11103
mattlord merged 20 commits intovitessio:mainfrom
yoheimuta:vscopy-restart

Conversation

@yoheimuta
Copy link
Copy Markdown
Contributor

@yoheimuta yoheimuta commented Aug 26, 2022

Description

The current vstream API doesn't resume a copy phase given the below VGTID.
Instead, it starts replicating the table from the given GTID while ignoring the lastpk information.

vgtid {
  shard_gtids {
    keyspace: "test_sharded_keyspace"
    shard: "-80"
    gtid: "MySQL56/ff540f7a-1d4d-11ed-93b5-0242ac110002:1-28"
    table_p_ks {
      table_name: "t1"
      lastpk {
        rows {
          lengths: 4
          values: "1000"
 ...

This PR allows vstream consumers to deal with the VGTID event as an opaque offset.
Given the non-empty GTID and the lastpk, the API is supposed to do the following:

  1. Catch up the t1 table from "MySQL56/ff540f7a-1d4d-11ed-93b5-0242ac110002:1-28" to the nearly latest
  2. Copy the t1 table with the condition "pk > 1000"

Related Issue(s)

Discussion: https://vitess.slack.com/archives/C0PQY0PTK/p1661155731380559

Implementation

Modify uvstreamer Init

Allow a non-empty position when tablePKs are specified. In this case an initlal catchup+fastforward phase will run before restarting copy phase on incompletely copied tables. (reference: https://vitess.io/docs/design-docs/vreplication/life-of-a-stream/)

Change how we filter events

Since we now run a catchup phase when we restart table copies, it is possible that duplicate events can be sent for rows that are not already copied, since we don't yet have logic to compare primary keys. To minimise this we filter out events for tables whose copy phase has not yet started.

Test changes

  1. New test TestVStreamCopyResume: passes both tablePKs and a specific position to validate new logic
    https://github.com/vitessio/vitess/pull/11103/files?diff=unified&w=1#diff-c98b8f318aac471272bf650c7824713c9664b46895380acf322ec3498a2fe578R236

  2. Updated test TestVStreamCopyCompleteFlow to account for change in logic for selecting events to send

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation PR: Document VStream website#1216

Pending issues

  1. Filter out events for rows not copied, for tables whose primary keys can be compared in code (essentially those that our eval engine can compare). This is not critical, since clients have to handle duplicate events in any case.
  2. Batch the copy phase (like we do for vreplication) so that we don't end up with a large vreplication lag (and possibly rotated binlogs). We didn't expect this to be critical since we are just streaming events, rather than inserting rows on the target, like we do for vreplication. But it is possible that some clients do more cpu/IO-intensive operations than just sending them to event sinks.

…upposed to resume the copy phase consistently

Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: yoheimuta <yoheimuta@gmail.com>
@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot bot commented Aug 26, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication labels Aug 26, 2022
@mattlord mattlord self-assigned this Aug 26, 2022
@mattlord mattlord requested review from rohit-nayak-ps and removed request for harshit-gangal, shlomi-noach and systay August 29, 2022 13:55
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Copy Markdown
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to find a way to implement this high level feature w/o changing the basic copy phase event filtering for all VStreams in uvstreamer (the unified vstreamer abstraction/primitive that everything else uses) as this changes key behavior for VReplication workflows and other things.

I also don't think that change should be necessary as we should be able to mirror the VReplication MoveTables/Reshard behavior here. @rohit-nayak-ps and I can discuss further.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord self-requested a review August 30, 2022 05:13
mattlord and others added 3 commits August 30, 2022 11:25
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps
Copy link
Copy Markdown
Member

@yoheimuta, I made the changes I referred to above. Please review and make any eventual changes, when you get a chance. We can then request @mattlord for a PR review.

@yoheimuta
Copy link
Copy Markdown
Contributor Author

@rohit-nayak-ps Thank you for your updates!
I reviewed it, and looks good to me 👍

@mattlord mattlord added release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) and removed Do Not Merge labels Nov 7, 2022
Copy link
Copy Markdown
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and makes sense to me! I only had some very minor question/comments/suggestions.

I'll push a commit now to mention this new feature in the 16.0 release notes.

We should also add a new VStream page in the 16.0 VReplication docs that can reference the existing copy table design doc. There we can touch on what it is, what use cases it serves, how it works and how to use it. We don't have any docs on the VTGate VStream API feature today (only the somewhat dated design doc) — so many people don't even know that it exists. I can get a PR started for that and add a reference in the description.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord requested a review from deepthi as a code owner November 7, 2022 23:47
mattlord added a commit to vitessio/website that referenced this pull request Nov 8, 2022
And the new vstream copy resume work added in:
  vitessio/vitess#11103

Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Copy Markdown
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review comments were extremely minor -- really only comment related -- so I'm going to approve now that I pushed an update to the release notes and started on the docs work (which is now linked): vitessio/website#1216

Thank you so much for the time you spent on this @yoheimuta !

@mattlord mattlord removed the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Nov 8, 2022
mattlord added a commit to vitessio/website that referenced this pull request Nov 8, 2022
And the new vstream copy resume work added in:
  vitessio/vitess#11103

Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Copy Markdown
Member

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@systay systay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nit-picks, but they are not stopping this PR

Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord requested a review from rsajwani as a code owner November 10, 2022 15:05
@mattlord mattlord merged commit e2e0699 into vitessio:main Nov 10, 2022
@yoheimuta yoheimuta deleted the vscopy-restart branch November 11, 2022 01:48
mattlord added a commit to vitessio/website that referenced this pull request Jan 7, 2023
* Document VStream

And the new vstream copy resume work added in:
  vitessio/vitess#11103

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Add more info

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Add VStreamFlags link

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Address review comments

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Document all RPC parameters and flags

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Various improvements after self review

Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: Matt Lord <mattalord@gmail.com>
timvaillancourt pushed a commit to slackhq/vitess that referenced this pull request May 27, 2024
…vitessio#11103)

* VSCopy: Demonstrate to fail a test case on which the vstream API is supposed to resume the copy phase consistently

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Resume the copy phase consistently from given GTID and lastpk

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* Build out the unit test some more

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update tests for new behavior

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Improve comments

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Limit uvstreamer changes and update test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Revert uvstreamer test changes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Revert all uvstream changes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* VCopy: Revert the last three commits

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Add a new vstream type that allows picking up where we left off

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Revert the unit test change

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Fix the end-to-end CI test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* Update logic for setting up uvstreamer based on input vgtid/tablepks. Add more catchup events to test

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Refactor logic to decide if event is to be sent. Enhance unit and e2e tests.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Don't send events for tables which we can identify as ones we haven't started copy for

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Minor changes after self-review

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Add vstream copy resume to release notes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Address review comments

Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Rohit Nayak <rohit@planetscale.com>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request May 28, 2024
* add vtgate flag that explicitly allows vstream copy (#125)

* fix fs.BoolVar

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* VSCopy: Resume the copy phase consistently from given GTID and lastpk (vitessio#11103)

* VSCopy: Demonstrate to fail a test case on which the vstream API is supposed to resume the copy phase consistently

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Resume the copy phase consistently from given GTID and lastpk

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* Build out the unit test some more

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update tests for new behavior

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Improve comments

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Limit uvstreamer changes and update test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Revert uvstreamer test changes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Revert all uvstream changes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* VCopy: Revert the last three commits

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Add a new vstream type that allows picking up where we left off

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Revert the unit test change

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VCopy: Fix the end-to-end CI test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* Update logic for setting up uvstreamer based on input vgtid/tablepks. Add more catchup events to test

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Refactor logic to decide if event is to be sent. Enhance unit and e2e tests.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Don't send events for tables which we can identify as ones we haven't started copy for

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Minor changes after self-review

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* Add vstream copy resume to release notes

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Address review comments

Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Rohit Nayak <rohit@planetscale.com>

* VSCopy: Send COPY_COMPLETED events when the copy operation is done (vitessio#11740)

* VSCopy: Demonstrate to fail a test case on which the vstream API sends new events showing copy completed

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Send new events when the copy operation is done

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Fix typo

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* Initialize new map for the 'vstream * from' vtgate sql interface. Make vtadmin web protos

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* VSCopy: Make TestVStreamCopyBasic fail fast to avoid the end2end timeout out

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: stop sharing the 't1' table among multiple test cases running concurrently

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: refactor the function signature to be clearer

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: refactor the VEvents sorter to be simpler

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: refactor to stop the sorter from including a fully copied event

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Co-authored-by: Rohit Nayak <rohit@planetscale.com>

* VSCopy: Enable to copy from all shards in either a specified keyspace or all keyspaces (vitessio#11909)

* VSCopy: Demonstrate to fail a test case on which the vstream API request doesn't include keyspace and shard

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Copy from all shards in all keyspaces by specifying only an empty gtid

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* tests: Make TestRowCount stable regardless of the number of keyspaces

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* tests: Cleanup TestCreateAndDropDatabase correctly to stop TestVStreamCopyWithoutKeyspaceShard from failing when running tests together

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* tests: Tweak to fix a comment

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: fix the unit tests when the input vgtid with an empty gtid lacks either keyspace or shard

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Keyspace wildcard selection lines up with the table wildcard selection

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Tests the VCopy with multiple keyspaces and resharding

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Make TestVStreamCopyMultiKeyspaceReshard clearer to check if the streaming two keyspaces works even after reshard

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Return an invalid argument error if shards are unspecified and gtid is neither 'current' nor empty

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Add a test description about its purpose and target

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Remove duplicate literals in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Retain defaultReplicas variable in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Explain why we are setting Match to 'customer.*' in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Remove an unused VStreamFlag for the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Use sentence capitalization in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Verify that we didn't lose any events or get duplicates of the keyspace being reshareded in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Return a value instead of a pointer because there is no need to modify the value

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Add a comment describing what TestVStreamCopyFromAllKeyspacesAndAllShards is doing and why

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Add a comment describing why we expect these specific numbers of events from VStream API

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Tweak the test case name

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Make a utility function to sort COPY_COMPLETED events in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Replace the matcher with a simpler one in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Move the print debug call to the FailNow section in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Use require.NoError in new tests

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Use require instead of t.Fatalf in the test

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Apply the reviewer's suggestion to make the error message easier to read

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Add a comment noting what we're actually testing

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Correct the test comment and elaborate the special-case

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Tweak an error message and a comment

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* VSCopy: Adjust to a change in the signature of a test function that was introduced in the main repository

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

---------

Signed-off-by: yoheimuta <yoheimuta@gmail.com>

* attempt unit test fix

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* update test error expected

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: yoheimuta <yoheimuta@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Co-authored-by: pbibra <pbibra@slack-corp.com>
Co-authored-by: yohei yoshimuta <yoheimuta@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Rohit Nayak <rohit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants