Inconsistent revisions across etcd instances #6626

nekto0n · 2016-10-11T18:47:49Z

Hi!
I am using the very latest etcd built from sources. I have a cluster consisting of 5 members. I have a lot of clients constanly updating keys.
If I create 5 test clients and start watching from each member and printing ModRevision I will see that some members are lagging behind, i.e I see watch responses with old revisions. E.g. at moment in time latest revision seen by each member looks like this

509862063 member5213
509724284 member7656
510546172 member7704
509724281 member7774
510481193 member-8190

We have staggering members (7774 and 7656) lagging behind about 600k revision. If I look at etcdctl endpoint status I see that RAFT INDEX is consistent across all members. But data is not.
Funny thing is: member 7656 is a leader and when we tried to reanimate 7774 (removing from cluster, wiping out data directory and bringing back), it dowloaded snapshot from lagging master and continued lagging behind.

Funny thing: it seems that members that report lower revision numbers - actually have newer key versions.

Any thoughts/suggestions where to dig further?

The text was updated successfully, but these errors were encountered:

xiang90 · 2016-10-11T18:50:17Z

What is the version of etcd are you running? Is it upgraded from a previous version? Can you share me the data dir?

xiang90 · 2016-10-11T18:51:38Z

Also it would be great if you can reproduce this reliably. I suspect it is caused by some bugs in lease pkg.

nekto0n · 2016-10-11T18:53:24Z

It is built from latest master. Not upgraded. Data dir is about 10Gb and I guess I cannot share it because of the content. Sorry.

nekto0n · 2016-10-11T18:54:41Z

I suspect it is caused by some bugs in lease pkg.

Oh, how can that be? We have only one lease for a single key (for election).

xiang90 · 2016-10-11T18:54:53Z

@nekto0n Can you try to reproduce this with an insensitive data set? Also sharing us the way to reproduce would be super helpful.

xiang90 · 2016-10-12T02:54:53Z

Funny thing: it seems that members that report lower revision numbers - actually have newer key versions.

@nekto0n Can you clarify this more?

nekto0n · 2016-10-12T04:33:21Z

I'm afraid it was a false alarm and I somehow managed to build etcd from older master (before fixes to lease management), so I guess this behavior is expected.

xiang90 · 2016-10-12T04:39:26Z

so I guess this behavior is expected.

Which behavior?

nekto0n · 2016-10-12T04:43:37Z

Inconsistent revision numbers across replicas (I guess), still not quite sure what happened.

xiang90 · 2016-10-12T04:51:06Z

One potential cause is that some of your machines are really slower than others. raft commands replication is actually faster than commands apply on HDD. raft replication == seq writes + net io, apply command = random io. Thus you might see same raft index but different revisions. But this is a guess.

nekto0n · 2016-10-12T05:08:28Z

I think we can close this. We'll be observing our installations more carefully and check your guess.

xiang90 added type/bug priority/P0 labels Oct 11, 2016

nekto0n closed this as completed Oct 12, 2016

xiang90 mentioned this issue Oct 12, 2016

mvcc: fix rev inconsistency #6633

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent revisions across etcd instances #6626

Inconsistent revisions across etcd instances #6626

nekto0n commented Oct 11, 2016

xiang90 commented Oct 11, 2016

xiang90 commented Oct 11, 2016

nekto0n commented Oct 11, 2016 •

edited

Loading

nekto0n commented Oct 11, 2016

xiang90 commented Oct 11, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

Inconsistent revisions across etcd instances #6626

Inconsistent revisions across etcd instances #6626

Comments

nekto0n commented Oct 11, 2016

xiang90 commented Oct 11, 2016

xiang90 commented Oct 11, 2016

nekto0n commented Oct 11, 2016 • edited Loading

nekto0n commented Oct 11, 2016

xiang90 commented Oct 11, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

xiang90 commented Oct 12, 2016

nekto0n commented Oct 12, 2016

nekto0n commented Oct 11, 2016 •

edited

Loading