Conversation
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Reparent: Move TER vtctl command from vttablet to wrangler
Reparent: add ability to watch shard data
* PlannedReparentShard: Allow retrying PRS to the existing master. This is an incremental first step toward making PRS more useful for repairing situations when replication across a shard is not fully consistent. The main thing this enables is retrying the step of reconfiguring all replicas (including the old master) to point to the new master. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix PRS test: Old master should have no slave status. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix comment. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
In particular, if we know we're master but the shard record is wrong, update it. And if another tablet takes over the shard record by having a more recent master term start time, we know we need to stop claiming to be master. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
tabletmanager: Keep tablet and shard in sync.
The new TER in wrangler skipped setting the master term start time. Now we start a master term if ChangeType() is called with type MASTER. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* Fix PlannedReparentShard unit tests We should not explicitly call SetMaster on the old master because PromoteSlaveWhenCaughtUp sets newMaster's tablet type to MASTER, which leads ShardSync to update the Shard record, which notifies the oldMaster's ShardSync, which calls SetMaster Signed-off-by: deepthi <deepthi@planetscale.com> * PromoteSlave should use a separate context and not reuse remoteCtx Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Fix vtgate_buffer test
Duplicated relevant RPC tests for wrangler. Moved unrelated tests to a different file, fixed RPC tests to not error out during SetMaster Signed-off-by: deepthi <deepthi@planetscale.com>
…otected by mutex Signed-off-by: deepthi <deepthi@planetscale.com>
* Remove obsolete comments. These are talking about the serving graph, which no longer exists. Instead of storing serving state of each tablet in topo, we now have vtgate directly query serving state of every tablet. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Make DemoteMaster idempotent. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
…Cancel Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Unit tests for wrangler version of TabletExternallyReparented
Signed-off-by: deepthi <deepthi@planetscale.com>
unit tests for shard watch
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
…update shard master Signed-off-by: deepthi <deepthi@planetscale.com>
…arting with InitTablet Signed-off-by: deepthi <deepthi@planetscale.com>
applicable conditions vttablet InitTablet should check MasterTermStartTime and take over if necessary fix unit test setup to work with changes to InitTablet functions Signed-off-by: deepthi <deepthi@planetscale.com>
…n-zero Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
…that new tablet is returned only if there is no error Signed-off-by: deepthi <deepthi@planetscale.com>
InitTablet should not update master alias on shard record
…er will do it (#5363) Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* PlannedReparentShard: Fix more known-recoverable problems. PlannedReparentShard should be able to fix replication as long as all tablets are reachable and all replication positions are in a mutually-consistent state. PRS also no longer trusts that the shard record contains up-to-date information on the master, because we update that record asynchronously now. Instead, it looks at MasterTermStartTime values stored in each master tablet's record, so it makes the same choice of master as vtgates. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PlannedReparentShard: Add -lag_threshold flag. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix expected error in reparent test. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PRS: Add test case for graceful recovery. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PRS: Measure replication progress instead of lag. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Member
Author
|
@sougou Before merging this, please make sure you change from "Squash and merge" to "Create a merge commit" so we don't lose individual authorship. We already reviewed and squashed along the way as we merged PRs into the dev branch. |
sougou
approved these changes
Nov 3, 2019
| // WatchShard will set a watch on the Shard object. | ||
| // It has the same contract as conn.Watch, but it also unpacks the | ||
| // contents into a Shard object | ||
| func (ts *Server) WatchShard(ctx context.Context, keyspace, shard string) (*WatchShardData, <-chan *WatchShardData, CancelFunc) { |
Contributor
There was a problem hiding this comment.
We'll eventually need to harden this to make sure it stays connected to the topo.
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the implementation of the plan discussed in #5172. The main features of the new implementation include:
-new_mastertablet is already the master. This means, for example, if PRS reports partial failure (e.g. some replicas couldn't be reached to reparent them), you can run it again to retry any failed operations.-new_masteris able to make progress replicating from the current master before setting the current master read-only. This avoids causing any disruption to the current master in the case when the candidate master is too far behind on replication to catch up within the timeout of the reparent operation.RELEASE NOTE: ACTION REQUIRED
When updating from a version before this PR to a version after it, it is critical that you follow the recommended upgrade order. In particular, you must upgrade all the vttablets in the cluster before upgrading any of the vtctlds.
Similarly, if you need to downgrade from a version after this PR to a version before it, you must downgrade in the reverse order: downgrade all vtctlds before downgrading any vttablets.