VReplication: use new topo named locks and TTL override for workflow coordination#16260
Merged
mattlord merged 15 commits intovitessio:mainfrom Jul 10, 2024
Merged
VReplication: use new topo named locks and TTL override for workflow coordination#16260mattlord merged 15 commits intovitessio:mainfrom
mattlord merged 15 commits intovitessio:mainfrom
Conversation
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
5 tasks
25e6572 to
f0cd253
Compare
And use that for VReplication workflows when coordination is necessary, such as between the VReplicaiton engine and the VDiff engine. Signed-off-by: Matt Lord <mattalord@gmail.com>
f0cd253 to
e2e58c9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16260 +/- ##
==========================================
- Coverage 68.71% 68.69% -0.02%
==========================================
Files 1547 1548 +1
Lines 198287 198444 +157
==========================================
+ Hits 136243 136330 +87
- Misses 62044 62114 +70 ☔ View full report in Codecov by Sentry. |
06a766b to
6576504
Compare
6576504 to
65e806f
Compare
49747a4 to
ee7ff15
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
7c1e30e to
48d78a2
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
deepthi
reviewed
Jun 28, 2024
Signed-off-by: Matt Lord <mattalord@gmail.com>
8a63b31 to
321c4f0
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
321c4f0 to
90143d5
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
2338a6d to
f4de475
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
e288f34 to
c284f6a
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
5 tasks
5 tasks
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The VReplication and VDiff engines need to coordinate on workflows as the
TableDiffermanipulates the associated workflow record (in the_vt.vreplicationtable) on its shard in order to initialize the diff (stopping the workflow, syncing up streams and snapshot positions, before restarting the workflow). In order to do so properly — meaning w/o impacting unrelated operations and thus able to use a lock that will not expire/be lost while work is in progress — they need a distributed lock in the topo server. Workflow information is, however, not stored in the topo server so there's no related record to lock. To address this gap, this PR adds support for named locks in the topo server. These are locks on an opaque name rather than a topo record/key. We then leverage this to lock the workflow using the unique name oftargetkeyspace/workflowname.Here's an example of their usage when running a vdiff with the local examples where the target keyspace is
customer(which has 2 shards) and the workflow iscommerce2customer:There remains a general issue that the Keyspace locks taken during traffic switches — where we update various records in the topo related to routing rules, shard records, etc — can be lost after 30 seconds (by default, for etcd2topo). This PR also addresses this by adding a mechanism in the topo service,
LockWithTTL, to override the default lock TTL for the topo implementation:--topo_etcd_lease_ttlflag--topo_consul_lock_session_ttlflagWe also now check to confirm that we are holding the locks between major operations.
Please see the RFC for additional details: #16269
Related Issue(s)
Checklist