Start server fails with "error validating peerURLs" (race?) #10626

purpleidea · 2019-04-10T23:17:26Z

I'm building some etcd clusters with the embed package. I startup three hosts: h1, then h2, then h3. Occasionally when I add the third host, the server fails to start with:

error validating peerURLs {ClusterID:cdf818194e3a8c32 Members:[&{ID:8e9e05c52164694d RaftAttributes:{PeerURLs:[http://localhost:2380]} Attributes:{Name:h1 ClientURLs:[http://localhost:2379]}} &{ID:b3a536e47da68b19 RaftAttributes:{PeerURLs:[http://127.0.0.1:2382]} Attributes:{Name:h2 ClientURLs:[http://127.0.0.1:2381]}}] RemovedMemberIDs:[]}: member count is unequal

The error is here:

etcd/etcdserver/server.go

Line 318 in ad5e169

    
           return nil, fmt.Errorf("error validating peerURLs %s: %v", existingCluster, err)

I think this is a race/timing bug. I believe that it happens more often if I startup H3 very quickly after H2. If I wait say 5 seconds before doing so, then I don't think it ever happens.

My suspicion is that after a new member (eg H2) is added, some async goroutine runs to update some cluster state. If this doesn't happen before H3 joins, then we hit this issue.

If I were to guess, I'd think that perhaps the client AutoSyncInterval is related, but I'm really out of my depth, in that I don't know all the etcd internals.

Thanks for reading!

The text was updated successfully, but these errors were encountered:

jingyih · 2019-04-10T23:34:52Z

How do you start the 3 members? (As member to a new cluster? or as new member to an existing cluster?) Can you share the related parameters when you start the member?

purpleidea · 2019-04-10T23:37:08Z

@jingyih I start H1 as a new cluster. H2 and H3 are joining the existing clusters.

Happy to share any parameters you want.

jingyih · 2019-04-10T23:47:11Z

Is it possible for you to configure all 3 members as new cluster? I think this is the recommended way of starting a cluster with multiple members. Here is an example:

etcd/Procfile

Lines 2 to 4 in c7c6894

    
           etcd1: bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr 
        
           etcd2: bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr 
        
           etcd3: bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

You are right about the racing part. If creating the cluster your way, the second member will take some time to catch up applying the raft log entries replicated from member 1. Therefore there is a short period of time when member 2 does not have the most recent view of the cluster.

purpleidea · 2019-04-10T23:55:14Z

@jingyih (Please let me know if this understanding is correct...)

I can't create H2 and H3 as new cluster, because if I did, that would mean that I can't use the cluster until they all joined, because H1 would be expecting two more members to be part of their quorum.

Rephrasing the problem, imagine that I startup H1. Then I use that for ten minutes, and THEN I decide to add H2 and H3 rapidly. My understanding is H2 and H3 would need to use existing cluster flag. (cfg.ClusterState = embed.ClusterStateFlagExisting)

If there is a signal I can wait for to know when server is caught up to cluster state, that would be fine. I'm using golang embed package. Currently after I run:

server, err = embed.StartEtcd(cfg)

I wait on:

case <-server.Server.ReadyNotify():

and

case <-server.Server.StopNotify():

Is there another signal I should wait for to know cluster is caught up?

Thanks!

jingyih · 2019-04-11T00:08:44Z

Yes, in your use case, H2 and H3 need to use existing cluster flag.

Could you share the member add commands and etcd startup parameters so that me and other contributors can try to reproduce and investigate?

purpleidea · 2019-04-11T00:14:51Z

@jingyih Yes, you can reproduce (modulo races) very easily like this:

I'm using embed package in https://github.com/purpleidea/mgmt/ project to try and build automatic clustering... Checkout branch feat/etcd.

git clone --recursive https://github.com/purpleidea/mgmt/
git checkout feat/etcd

Run make deps or just go get ... to add the dependencies.

Run make to build.

To test:

./mgmt run --tmp-prefix --no-pgp --hostname h1 empty
./mgmt run --tmp-prefix --no-pgp --hostname h2 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2381 --server-urls http://127.0.0.1:2382 empty
./mgmt run --tmp-prefix --no-pgp --hostname h3 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2383 --server-urls http://127.0.0.1:2384 empty

That's how you start H1, H2 and H3 up. Do H3 very quickly after you run H2.

jingyih · 2019-04-11T00:25:38Z

Could you provide a script that does the same procedure but using etcd directly? From the script you provided, it is not clear how the etcd is used.

purpleidea · 2019-04-11T00:29:10Z

@jingyih A big golang program is such a script :) It would not be easy to write such a standalone script. But the basic algorithm is described at the top here: https://github.com/purpleidea/mgmt/blob/a87853508a12b478a8825c4f16ae33ddec0765e6/etcd/etcd.go#L31

And the actual server startup stuff happens here: https://github.com/purpleidea/mgmt/blob/a87853508a12b478a8825c4f16ae33ddec0765e6/etcd/server.go#L161

Let me know if you need anything else.

purpleidea · 2019-04-11T00:30:03Z

PS: In: #10626 (comment)

Is there another signal I should wait for to know cluster is caught up?

Thanks!

jingyih · 2019-04-11T00:50:20Z

PS: In: #10626 (comment)

Is there another signal I should wait for to know cluster is caught up?

Thanks!

I think you are looking at the correct signal. However, If I understand correctly, it is not directly related to when the config change entry gets applied.

But first I should probably verify if what I described earlier is what actually happened in your case.

purpleidea · 2019-04-11T00:53:43Z

@jingyih I appreciate that you're looking into this, and please let me know if I can help in any way.

Ultimately I want clustering of etcd to be automatic, so hopefully this helps get us one step closer :)

purpleidea · 2019-04-11T03:17:44Z

Possibly related, I was able to trigger a panic (just once so far) by adding a member fairly quickly:

I'm using v.3.3.12

panic: tocommit(55) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?

goroutine 197 [running]:
github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4204e6ba0, 0x1a7d620, 0x5d, 0xc4203387e0, 0x2, 0x2)
	/home/james/code/gopath/src/github.com/coreos/pkg/capnslog/pkg_logger.go:83 +0x162
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420273a40, 0x37)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/log.go:191 +0x15c
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:1194 +0x54
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.stepFollower(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:1140 +0x3ff
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:868 +0x12f1
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*node).run(0xc420652000, 0xc42050a900)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/node.go:323 +0x1059
created by github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.StartNode
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/node.go:210 +0x61e

jingyih · 2019-04-11T04:18:20Z

I am still trying to understand more about the server booting process. So far this is my understanding of the issue. Not sure if this is by design due to bootstrapping needs.

In current implementation, serving reads directly from this local member structure could include stale information.

etcd/etcdserver/api/membership/cluster.go

Line 56 in c7c6894

members map[types.ID]*Member

For example, in a two node cluster where node 1 is raft leader, a member add request sent to node 1 will return when the member add raft log entry is appended to both node 1 and 2's raft log, and node 1 finished applying the member add entry to its members struct. If node 2 has slow apply, its members struct might not have information about the newly added node. If the new node (which is node 3) tries to join this cluster and queries the existing cluster information from node 2, it will cause the error message you observed.

I did not actually reproduce the issue using your software, so this is my speculation based on code reading.

jingyih · 2019-04-11T04:24:29Z

We can get this fixed if my speculation is proved by issue reproduction. Currently my bandwidth is very limited. I may revisit this issue later. For now, adding some delay will help mitigate the issue.

purpleidea · 2019-04-11T04:24:47Z

@jingyih If I understand this correctly, you are describing a possible bug in etcd itself, correct? If so perhaps this could be easily fixed by having a signal that an embed user can watch for to know when the started server is up-to-date?

This will be helpful for many situations I think too.

With such a signal, I could wait for that and test to see if it removes the race.

purpleidea · 2019-04-11T21:12:15Z

I did some more extensive testing today. I think that if I run member add and then wait enough time before starting the new server, then this is not an issue. It's possible that it's a race in my code as well.

In any case I think this could also be solved by: #10537 (comment) specifically:

think the best way to model all this is to add MemberAPI operations into etcd.Op so that they can happen in a Txn.

That way the PUT operation that a new client is watching on to tell it to start up, can be done at the same time as the MemberAdd operation.

I will close this for now.

jingyih · 2019-04-11T21:21:56Z

Member operations are configuration changes, whereas PUT is a mutating operation on key space. For example you cannot add or delete a etcd member by simply updating a key value. They are very very different kinds of operations. Sorry but what you suggested does not make sense to me.

purpleidea · 2019-04-11T21:34:17Z

They are very very different kinds of operations.

I understand, I was just proposing an idea for a race-free API that would help solve the need for cfg.StrictReconfigCheck = false.

purpleidea closed this as completed Apr 11, 2019

jingyih mentioned this issue Oct 3, 2019

Member list should be served with linearizable guarantee #11198

Closed

rancher-max mentioned this issue Nov 23, 2020

Servers fail to join cluster if multiple nodes are joined concurrently or image import/pull takes more than 5 minutes rancher/rke2#349

Closed

migruiz4 mentioned this issue May 18, 2022

[bitnami/etcd] Scale from 0 or 1 for replica not working bitnami/charts#10009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start server fails with "error validating peerURLs" (race?) #10626

Start server fails with "error validating peerURLs" (race?) #10626

purpleidea commented Apr 10, 2019

jingyih commented Apr 10, 2019

purpleidea commented Apr 10, 2019

jingyih commented Apr 10, 2019

purpleidea commented Apr 10, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019 •

edited

Loading

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

Start server fails with "error validating peerURLs" (race?) #10626

Start server fails with "error validating peerURLs" (race?) #10626

Comments

purpleidea commented Apr 10, 2019

jingyih commented Apr 10, 2019

purpleidea commented Apr 10, 2019

jingyih commented Apr 10, 2019

purpleidea commented Apr 10, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019 • edited Loading

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019

purpleidea commented Apr 11, 2019

jingyih commented Apr 11, 2019 •

edited

Loading