Skip to content

PlannedReparentShard: Fix more known-recoverable problems.#5376

Merged
enisoc merged 5 commits intovitessio:reparent-refactorfrom
planetscale:prs-tablet-timestamp
Oct 31, 2019
Merged

PlannedReparentShard: Fix more known-recoverable problems.#5376
enisoc merged 5 commits intovitessio:reparent-refactorfrom
planetscale:prs-tablet-timestamp

Conversation

@enisoc
Copy link
Member

@enisoc enisoc commented Oct 29, 2019

PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.

PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.

Signed-off-by: Anthony Yeh enisoc@planetscale.com

@enisoc enisoc requested a review from deepthi October 29, 2019 23:39
@enisoc enisoc requested a review from sougou as a code owner October 29, 2019 23:39
Copy link
Collaborator

@deepthi deepthi Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern with piggy-backing the replication lag timeout on waitReplicasTimeout is that it doesn't allow people to opt out of this check. We are arguing that this is OK because a replica with lag > waitReplicasTimeout is unlikely to catch up during that time.

SecondsBehindMaster is in fact not reliable and there are cases when it doesn't get updated often enough by mysql and is actually reported as much higher than the real lag. This seems to happen in precisely the situation we are creating here - where there are no more writes to master.
See last comment in #5000 (comment)

We should consider adding another flag (-skip_replication_lag_check) to the PRS command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. What about making a flag to customize the SecondsBehindMaster threshold, instead of an on/off thing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I had originally planned to implement for #4700, so I'll vote yes on that.

PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.

PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc force-pushed the prs-tablet-timestamp branch from 39fcd75 to 5e39eef Compare October 30, 2019 20:00
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Copy link
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I just have a nit in one of the error messages.

}
// Check if it's behind by a small enough amount.
if float64(status.SecondsBehindMaster) > masterElectLagThreshold.Seconds() {
return vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "replication lag on master-elect %v (%v seconds) is greater than the specified lag threshold (%v); let replication catch up first or try again with a higher threshold", masterElectTabletAliasStr, status.SecondsBehindMaster, masterElectLagThreshold)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lag threshold -> lag_threshold
to be consistent with user-visible flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this flag entirely since we don't check lag anymore, as discussed offline.

if topoproto.TabletAliasEqual(shardInfo.MasterAlias, masterElectTabletAlias) {
// If the master is already the one we want, we just try to fix replicas (below).
rp, err := wr.tmc.MasterPosition(remoteCtx, masterElectTabletInfo.Tablet)
if currentMaster == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PRS is now handling the cases of no master / multi-master, that means situations where ERS is required should become rare to nonexistent. Let us make sure to document that in the eventual PR for merging upstream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. To summarize, my goal for PRS is that eventually it should be able to fix almost any problem as long as:

  1. All tablets are reachable, so we can check global state.
    AND
  2. The global replication state (relative positions) is consistent and compatible with making the chosen tablet the master.

You should then only need ERS in the following cases:

  1. The current master is unreachable.
    OR
  2. The relative replication positions have become inconsistent (e.g. alternative futures).
    OR
  3. It's unclear who the current master is, and some tablets are unreachable, which means we can't be sure if the global state is consistent.

@enisoc enisoc force-pushed the prs-tablet-timestamp branch from e9195e1 to 65c43c3 Compare October 31, 2019 06:38
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc force-pushed the prs-tablet-timestamp branch from 65c43c3 to df16897 Compare October 31, 2019 08:37
// passed in master tablet alias, and wait for the row in the
// reparent_journal table (if timeCreatedNS is non-zero).
SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, forceStartSlave bool) error
SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, waitPosition string, forceStartSlave bool) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't changing the interface here cause problems during upgrade?
Old vtctld's wrangler will call the old version of SetMaster, which won't work on an already upgraded vttablet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the call crosses process boundaries, it gets encoded as protobuf on the wire. The protobuf level is thus where we need to ensure compatibility when changing existing RPCs.

Adding a new, optional field in the Request struct like this should be safe. The old vtctld will not try to use the new field because it doesn't know about it. The new vttablet will simply receive a Request protobuf with the new field unset, so it will be left on the zero value.

Copy link
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enisoc enisoc merged commit 9a09f82 into vitessio:reparent-refactor Oct 31, 2019
@enisoc enisoc deleted the prs-tablet-timestamp branch October 31, 2019 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants