Skip to content

VStream: Allow for automatic resume after reshard#15393

Closed
mattlord wants to merge 2 commits intovitessio:mainfrom
planetscale:vrepl_reshard_resume
Closed

VStream: Allow for automatic resume after reshard#15393
mattlord wants to merge 2 commits intovitessio:mainfrom
planetscale:vrepl_reshard_resume

Conversation

@mattlord
Copy link
Member

@mattlord mattlord commented Mar 1, 2024

Description

When the scenario is detected, and we have the needed info, we will start a new copy resume phase to automatically resume when a Reshard has occurred since the last client stream.

Manual test on this branch:

./101_initial_cluster.sh; mysql < ../common/insert_commerce_data.sql; ./201_customer_tablets.sh ; ./202_move_tables.sh; ./203_switch_reads.sh; ./204_switch_writes.sh; ./205_clean_commerce.sh; ./301_customer_sharded.sh; ./302_new_shards.sh; ./303_reshard.sh; ./304_switch_reads.sh; ./305_switch_writes.sh

sleep 10

mysql -e "insert into customer (email) values ('mlord@planetscale.com')"
mysql -e "insert into customer (email) values ('mlord@planetscale.com')"

go run vstream_client.go

# In another shell
mysql -e "insert into customer (email) values ('mlord@planetscale.com')"

You'll see that we did not miss any post Reshard writes.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Signed-off-by: Matt Lord <mattalord@gmail.com>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Mar 1, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Mar 1, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Mar 1, 2024
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vrepl_reshard_resume branch from 93069ee to c4ba882 Compare March 2, 2024 00:55
@mattlord mattlord closed this Mar 2, 2024
@mattlord mattlord deleted the vrepl_reshard_resume branch March 2, 2024 17:40
tanjinx pushed a commit to slackhq/vitess that referenced this pull request Mar 17, 2025
tanjinx added a commit to slackhq/vitess that referenced this pull request Mar 18, 2025
…tessio#15393) (#627)

Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
makinje16 pushed a commit to slackhq/vitess that referenced this pull request Mar 20, 2025
…tessio#15393) (#627)

Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
tanjinx added a commit to slackhq/vitess that referenced this pull request Mar 24, 2025
…d Journal Events (#585)

* VTGate VStream: Ensure reasonable delivery time for reshard journal event  (vitessio#16639)

Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>

* Backport sqlparser patch for v15->v19 upgrade: 14763 Fix accepting bind variables in time related function calls (#590)

* Fix accepting bind variables in time related function calls. (vitessio#14763)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* fix test

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* Upgrade vitess addons to 0.19.8 (#591)

This upgrade allows us to control whether vtorc raises problems or not
via an environment variable.

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

* Use prefix in all vtorc check and recover logs (vitessio#17526) (#592)

This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

* `slack-19.0`: various backports for `vtorc`, part 2 (#596)

* Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Revert "add keyrange support for vtorc clusters_to_watch (#457)"

This reverts commit 45c2199.

* [release-19.0] `vtorc`: require topo for `Healthy: true` in `/debug/health` (vitessio#17129) (vitessio#17351)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>

* `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Support KeyRange in `--clusters_to_watch` flag (vitessio#17604)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* missing func

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add api end point to print the current database state in VTOrc (vitessio#15485)

Signed-off-by: Manan Gupta <manan@planetscale.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* `slack-19.0`: `vtorc`: improve handling of partial cell topo results (#599)

* `vtorc`: improve handling of partial cell topo results

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add unit test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* improve test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add comments

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* move sort to test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* goimports

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `slack-19.0`: skip tests that will fail on v15 downgrade testing (#605)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `slack-19.0`: Add stats for shards watched by VTOrc (#606)

* Add stats for shards watched by VTOrc

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Use len() in make

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add `GetServerStatus` RPC to use in PRS (vitessio#16022) (#607)

Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* backport/patch connection pool bug/perf fixes (#604)

* [release-19.0] smartconnpool: do not allow connections to starve (vitessio#17675) (vitessio#17683)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>

* smartconnpool: Better handling for idle expiration (vitessio#17756)

Signed-off-by: Vicent Marti <vmg@strn.cat>

---------

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>

* pool: reopen connection closed by idle timeout (vitessio#17818) (#609)

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Harshit Gangal <harshit@planetscale.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>

* VReplication: Support excluding lagging tablets and use this in vstream manager (vitessio#17835) (#612)

* `slack-19.0`: backport v22 VTOrc optimizations, part 2 (#613)

* `vtorc`: remove duplicate instance read from backend (vitessio#17834)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add index for `inst.ReadInstanceClusterAttributes` table scan

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add stats for shards watched by VTOrc, purge stale shards (vitessio#17815) (#616)

* --consolidator-query-waiter-cap to set the max number of waiter for consolidated query (vitessio#17244) (#614)

Signed-off-by: Jun Wang <jun.wang@demonware.net>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com>
Co-authored-by: Jun Wang <jun.wang@demonware.net>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* missing import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* get count from backend

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* rm unused map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* simplify

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* use map of map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* gofmt lint

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix plural in names

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix releasing the global read lock when mysqlshell backup fails (vitessio#17000) (#623)

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593) (#620)

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593)

Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* missing import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* get count from backend

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* rm unused map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* simplify

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* use map of map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* gofmt lint

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix plural in names

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Increase health check channel buffer (vitessio#17821) (#625)

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* VStream: Allow for automatic resume after Reshard across VStreams (vitessio#15393) (#627)

Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>

---------

Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Signed-off-by: Jun Wang <jun.wang@demonware.net>
Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Tanjin Xu <109303790+tanjinx@users.noreply.github.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Harshit Gangal <harshit@planetscale.com>
Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com>
Co-authored-by: Jun Wang <jun.wang@demonware.net>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Renan Rangel <rvrangel@users.noreply.github.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant