PlannedReparentShard: Fix more known-recoverable problems.#5376
PlannedReparentShard: Fix more known-recoverable problems.#5376enisoc merged 5 commits intovitessio:reparent-refactorfrom
Conversation
go/vt/wrangler/reparent.go
Outdated
There was a problem hiding this comment.
One concern with piggy-backing the replication lag timeout on waitReplicasTimeout is that it doesn't allow people to opt out of this check. We are arguing that this is OK because a replica with lag > waitReplicasTimeout is unlikely to catch up during that time.
SecondsBehindMaster is in fact not reliable and there are cases when it doesn't get updated often enough by mysql and is actually reported as much higher than the real lag. This seems to happen in precisely the situation we are creating here - where there are no more writes to master.
See last comment in #5000 (comment)
We should consider adding another flag (-skip_replication_lag_check) to the PRS command.
There was a problem hiding this comment.
Good catch. What about making a flag to customize the SecondsBehindMaster threshold, instead of an on/off thing?
There was a problem hiding this comment.
That is what I had originally planned to implement for #4700, so I'll vote yes on that.
PlannedReparentShard should be able to fix replication as long as all tablets are reachable and all replication positions are in a mutually-consistent state. PRS also no longer trusts that the shard record contains up-to-date information on the master, because we update that record asynchronously now. Instead, it looks at MasterTermStartTime values stored in each master tablet's record, so it makes the same choice of master as vtgates. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
39fcd75 to
5e39eef
Compare
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
deepthi
left a comment
There was a problem hiding this comment.
Nice! I just have a nit in one of the error messages.
go/vt/wrangler/reparent.go
Outdated
| } | ||
| // Check if it's behind by a small enough amount. | ||
| if float64(status.SecondsBehindMaster) > masterElectLagThreshold.Seconds() { | ||
| return vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "replication lag on master-elect %v (%v seconds) is greater than the specified lag threshold (%v); let replication catch up first or try again with a higher threshold", masterElectTabletAliasStr, status.SecondsBehindMaster, masterElectLagThreshold) |
There was a problem hiding this comment.
lag threshold -> lag_threshold
to be consistent with user-visible flag.
There was a problem hiding this comment.
I removed this flag entirely since we don't check lag anymore, as discussed offline.
| if topoproto.TabletAliasEqual(shardInfo.MasterAlias, masterElectTabletAlias) { | ||
| // If the master is already the one we want, we just try to fix replicas (below). | ||
| rp, err := wr.tmc.MasterPosition(remoteCtx, masterElectTabletInfo.Tablet) | ||
| if currentMaster == nil { |
There was a problem hiding this comment.
Since PRS is now handling the cases of no master / multi-master, that means situations where ERS is required should become rare to nonexistent. Let us make sure to document that in the eventual PR for merging upstream.
There was a problem hiding this comment.
Good point. To summarize, my goal for PRS is that eventually it should be able to fix almost any problem as long as:
- All tablets are reachable, so we can check global state.
AND - The global replication state (relative positions) is consistent and compatible with making the chosen tablet the master.
You should then only need ERS in the following cases:
- The current master is unreachable.
OR - The relative replication positions have become inconsistent (e.g. alternative futures).
OR - It's unclear who the current master is, and some tablets are unreachable, which means we can't be sure if the global state is consistent.
e9195e1 to
65c43c3
Compare
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
65c43c3 to
df16897
Compare
| // passed in master tablet alias, and wait for the row in the | ||
| // reparent_journal table (if timeCreatedNS is non-zero). | ||
| SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, forceStartSlave bool) error | ||
| SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, waitPosition string, forceStartSlave bool) error |
There was a problem hiding this comment.
Doesn't changing the interface here cause problems during upgrade?
Old vtctld's wrangler will call the old version of SetMaster, which won't work on an already upgraded vttablet.
There was a problem hiding this comment.
When the call crosses process boundaries, it gets encoded as protobuf on the wire. The protobuf level is thus where we need to ensure compatibility when changing existing RPCs.
Adding a new, optional field in the Request struct like this should be safe. The old vtctld will not try to use the new field because it doesn't know about it. The new vttablet will simply receive a Request protobuf with the new field unset, so it will be left on the zero value.
PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.
PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.
Signed-off-by: Anthony Yeh enisoc@planetscale.com