VReplication: Gather source positions once we know all writes are done during traffic switch#16572
Conversation
Signed-off-by: Matt Lord <mattalord@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16572 +/- ##
==========================================
- Coverage 68.84% 68.83% -0.02%
==========================================
Files 1558 1558
Lines 200025 200048 +23
==========================================
- Hits 137714 137694 -20
- Misses 62311 62354 +43 ☔ View full report in Codecov by Sentry. |
0e100c7 to
2f8f73d
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
2f8f73d to
a76bf9a
Compare
a0a027f to
dbb1ee5
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
f8675c1 to
8b030db
Compare
8b030db to
fcdc8bd
Compare
deepthi
left a comment
There was a problem hiding this comment.
Can you amend the last paragraph of the PR description, given that you have included a unit test?
Rest LGTM
|
I think this issue may have been introduced by #13015, which is why many people have not run into this yet, as it only affects v18 and later. I didn't dig into this in much detail at all (and I could be totally wrong), but if I'm right, is this PR worth backporting? |
@GrahamCampbell no, this aspect is the same in the
Exactly nobody has reported it. It's not that "many people have not". The issue reported was from going through the code for the distributed transactions work, not because somebody encountered a problem. |
…e during traffic switch (vitessio#16572) Signed-off-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
… SwitchTraffic) (#702) * VReplication: Gather source positions once we know all writes are done during traffic switch (vitessio#16572) Signed-off-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com> * update missed changes Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com> --------- Signed-off-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com> Co-authored-by: Matt Lord <mattalord@gmail.com>
Description
This PR addresses a logic bug when switching traffic:
LOCK TABLESrounds to ensure that there are no in flight writes on the tables we're migrating, then we ensure that the migration targets' replication position is at least where the migration source was.I have not been able to trigger any actual bug/issue (lost writes during a traffic switch) through extensive manual testing and nobody has seen or reported an issue around this, but it's clearly a logical bug.
We address this here in a way that causes no additional work or overhead by simply moving the position gathering work from the
stopSourceWritesstep/function to its own step/function that we call immediately before thewaitForCatchupstep/function.Related Issue(s)
Checklist