-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent revisions across etcd instances #6626
Comments
What is the version of etcd are you running? Is it upgraded from a previous version? Can you share me the data dir? |
Also it would be great if you can reproduce this reliably. I suspect it is caused by some bugs in lease pkg. |
It is built from latest master. Not upgraded. Data dir is about 10Gb and I guess I cannot share it because of the content. Sorry. |
Oh, how can that be? We have only one lease for a single key (for election). |
@nekto0n Can you try to reproduce this with an insensitive data set? Also sharing us the way to reproduce would be super helpful. |
@nekto0n Can you clarify this more? |
I'm afraid it was a false alarm and I somehow managed to build etcd from older master (before fixes to lease management), so I guess this behavior is expected. |
Which behavior? |
Inconsistent revision numbers across replicas (I guess), still not quite sure what happened. |
One potential cause is that some of your machines are really slower than others. raft commands replication is actually faster than commands apply on HDD. raft replication == seq writes + net io, apply command = random io. Thus you might see same raft index but different revisions. But this is a guess. |
I think we can close this. We'll be observing our installations more carefully and check your guess. |
Hi!
I am using the very latest etcd built from sources. I have a cluster consisting of 5 members. I have a lot of clients constanly updating keys.
If I create 5 test clients and start watching from each member and printing
ModRevision
I will see that some members are lagging behind, i.e I see watch responses with old revisions. E.g. at moment in time latest revision seen by each member looks like thisWe have staggering members (7774 and 7656) lagging behind about 600k revision. If I look at
etcdctl endpoint status
I see thatRAFT INDEX
is consistent across all members. But data is not.Funny thing is: member 7656 is a leader and when we tried to reanimate 7774 (removing from cluster, wiping out data directory and bringing back), it dowloaded snapshot from lagging master and continued lagging behind.
Funny thing: it seems that members that report lower revision numbers - actually have newer key versions.
Any thoughts/suggestions where to dig further?
The text was updated successfully, but these errors were encountered: