You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.
We've been using kops for a few years, and prior to the introduction of etcd-manager we relied on our own EBS backup strategy. This led to a number of etcd volumes being present in our AWS account that matched the tags used by etcd-manager to select and mount storage.
The first host that came up after the rolling-update that installed etcd-manager had the following in its logs:
I1209 16:26:18.958984 18726 volumes.go:85] AWS API Request: ec2metadata/GetMetadata
I1209 16:26:18.959912 18726 volumes.go:85] AWS API Request: ec2metadata/GetMetadata
I1209 16:26:18.960468 18726 volumes.go:85] AWS API Request: ec2metadata/GetMetadata
I1209 16:26:18.961016 18726 volumes.go:85] AWS API Request: ec2metadata/GetMetadata
I1209 16:26:18.961520 18726 main.go:254] Mounting available etcd volumes matching tags [k8s.io/etcd/main k8s.io/role/master=1 kubernetes.io/cluster/kube.us-east-1.dev.deploys.brightcove.com=owned]; nameTag=k8s.io/etcd/main
I1209 16:26:18.962655 18726 volumes.go:85] AWS API Request: ec2/DescribeVolumes
I1209 16:26:19.152540 18726 mounter.go:302] Currently attached volumes: [0xc00025af00]
I1209 16:26:19.152574 18726 mounter.go:64] Master volume "vol-0a5a75bec90179bd8" is attached at "/dev/xvdu"
I1209 16:26:19.152590 18726 mounter.go:78] Doing safe-format-and-mount of /dev/xvdu to /mnt/master-vol-0a5a75bec90179bd8
I1209 16:26:19.152604 18726 volumes.go:233] volume vol-0a5a75bec90179bd8 not mounted at /rootfs/dev/xvdu
I1209 16:26:19.152639 18726 volumes.go:247] found nvme volume "nvme-Amazon_Elastic_Block_Store_vol0a5a75bec90179bd8" at "/dev/nvme1n1"
I1209 16:26:19.152652 18726 mounter.go:116] Found volume "vol-0a5a75bec90179bd8" mounted at device "/dev/nvme1n1"
I1209 16:26:19.153151 18726 mounter.go:173] Device already mounted on "/mnt/master-vol-0a5a75bec90179bd8", verifying it is our device
I1209 16:26:19.153167 18726 mounter.go:185] Found existing mount of "/dev/nvme1n1" at "/mnt/master-vol-0a5a75bec90179bd8"
I1209 16:26:19.153241 18726 mount_linux.go:164] Detected OS without systemd
I1209 16:26:19.153789 18726 mounter.go:226] matched device "/dev/nvme1n1" and "/dev/nvme1n1" via '\x00'
I1209 16:26:19.153803 18726 mounter.go:86] mounted master volume "vol-0a5a75bec90179bd8" on /mnt/master-vol-0a5a75bec90179bd8
I1209 16:26:19.153816 18726 main.go:269] discovered IP address: 10.250.16.215
I1209 16:26:19.153823 18726 main.go:274] Setting data dir to /rootfs/mnt/master-vol-0a5a75bec90179bd8
I1209 16:26:19.154260 18726 server.go:71] starting GRPC server using TLS, ServerName="etcd-manager-server-etcd-b"
I1209 16:26:19.154403 18726 s3context.go:331] product_uuid is "ec2004e4-d619-9524-bf5b-e56ce28c2bd6", assuming running on EC2
I1209 16:26:19.155152 18726 s3context.go:164] got region from metadata: "us-east-1"
I1209 16:26:19.212772 18726 s3context.go:210] found bucket in region "us-east-1"
I1209 16:26:19.212798 18726 s3fs.go:128] Writing file "s3://com.brightcove.deploys.dev.kube.dev-us-east-1/kube.us-east-1.dev.deploys.brightcove.com/backups/etcd/main/control/etcd-cluster-created"
I1209 16:26:19.212816 18726 s3context.go:238] Checking default bucket encryption for "com.brightcove.deploys.dev.kube.dev-us-east-1"
W1209 16:26:19.272282 18726 controller.go:135] not enabling TLS for etcd, this is insecure
I1209 16:26:19.272306 18726 server.go:89] GRPC server listening on "10.250.16.215:3996"
I1209 16:26:19.272403 18726 etcdserver.go:534] starting etcd with state cluster:<cluster_token:"ckDjqRPhIBJGj0dtx6qVlw" nodes:<name:"etcd-a" peer_urls:"http://etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com:2380" client_urls:"http://etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com:4001" quarantined_client_urls:"http://etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com:3994" > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:2380" client_urls:"http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:4001" quarantined_client_urls:"http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com:2380" client_urls:"http://etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com:4001" quarantined_client_urls:"http://etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com:3994" > > etcd_version:"2.2.1"
I1209 16:26:19.272549 18726 etcdserver.go:543] starting etcd with datadir /rootfs/mnt/master-vol-0a5a75bec90179bd8/data/ckDjqRPhIBJGj0dtx6qVlw
I1209 16:26:19.272548 18726 volumes.go:85] AWS API Request: ec2/DescribeVolumes
W1209 16:26:19.272599 18726 pki.go:46] not generating peer keypair as peers-ca not set
W1209 16:26:19.272626 18726 pki.go:84] not generating client keypair as clients-ca not set
I1209 16:26:19.272703 18726 etcdprocess.go:180] executing command /opt/etcd-v2.2.1-linux-amd64/etcd [/opt/etcd-v2.2.1-linux-amd64/etcd]
W1209 16:26:19.272749 18726 etcdprocess.go:234] using insecure configuration for etcd peers
W1209 16:26:19.272774 18726 etcdprocess.go:243] using insecure configuration for etcd clients
2019-12-09 16:26:19.277754 I | flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:4001
2019-12-09 16:26:19.277784 I | flags: recognized and used environment variable ETCD_DATA_DIR=/rootfs/mnt/master-vol-0a5a75bec90179bd8/data/ckDjqRPhIBJGj0dtx6qVlw
2019-12-09 16:26:19.277799 I | flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:2380
2019-12-09 16:26:19.277814 I | flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=etcd-a=http://etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com:2380,etcd-b=http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:2380,etcd-c=http://etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com:2380
2019-12-09 16:26:19.277820 I | flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=existing
2019-12-09 16:26:19.277830 I | flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=ckDjqRPhIBJGj0dtx6qVlw
2019-12-09 16:26:19.277838 I | flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:4001
2019-12-09 16:26:19.277848 I | flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
2019-12-09 16:26:19.277859 I | flags: recognized and used environment variable ETCD_NAME=etcd-b
2019-12-09 16:26:19.277889 W | flags: unrecognized environment variable ETCD_LISTEN_METRICS_URLS=
2019-12-09 16:26:19.277934 I | etcdmain: etcd Version: 2.2.1
2019-12-09 16:26:19.277938 I | etcdmain: Git SHA: Not provided (use ./build instead of go build)
2019-12-09 16:26:19.277941 I | etcdmain: Go Version: go1.12.5
2019-12-09 16:26:19.277945 I | etcdmain: Go OS/Arch: linux/amd64
2019-12-09 16:26:19.277949 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2019-12-09 16:26:19.277992 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2019-12-09 16:26:19.278095 I | etcdmain: listening for peers on http://0.0.0.0:2380
2019-12-09 16:26:19.278118 I | etcdmain: listening for client requests on http://0.0.0.0:4001
2019-12-09 16:26:19.371091 I | etcdserver: recovered store from snapshot at index 380038
2019-12-09 16:26:19.371126 I | etcdserver: name = etcd-b
2019-12-09 16:26:19.371130 I | etcdserver: data dir = /rootfs/mnt/master-vol-0a5a75bec90179bd8/data/ckDjqRPhIBJGj0dtx6qVlw
2019-12-09 16:26:19.371134 I | etcdserver: member dir = /rootfs/mnt/master-vol-0a5a75bec90179bd8/data/ckDjqRPhIBJGj0dtx6qVlw/member
2019-12-09 16:26:19.371138 I | etcdserver: heartbeat = 100ms
2019-12-09 16:26:19.371140 I | etcdserver: election = 1000ms
2019-12-09 16:26:19.371144 I | etcdserver: snapshot count = 10000
2019-12-09 16:26:19.371155 I | etcdserver: advertise client URLs = http://etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:4001
2019-12-09 16:26:19.371185 I | etcdserver: loaded cluster information from store: <nil>
I1209 16:26:19.373963 18726 volumes.go:85] AWS API Request: ec2/DescribeInstances
2019-12-09 16:26:19.412180 I | etcdserver: restarting member a8bc606d954cb360 in cluster 362b3eb57d5b3247 at commit index 386849
2019-12-09 16:26:19.412557 I | raft: a8bc606d954cb360 became follower at term 826
2019-12-09 16:26:19.412578 I | raft: newRaft a8bc606d954cb360 [peers: [21c1cba54be22c9a,85558b08fd6377a2,a8bc606d954cb360], term: 826, commit: 386849, applied: 380038, lastindex: 386849, lastterm: 20]
2019-12-09 16:26:19.419037 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[21c1cba54be22c9a]=138e61746ab70219, local=362b3eb57d5b3247)
2019-12-09 16:26:19.419054 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[21c1cba54be22c9a]=138e61746ab70219, local=362b3eb57d5b3247)
2019-12-09 16:26:19.419064 E | rafthttp: failed to dial 21c1cba54be22c9a on stream Message (cluster ID mismatch)
2019-12-09 16:26:19.419073 E | rafthttp: failed to dial 21c1cba54be22c9a on stream MsgApp v2 (cluster ID mismatch)
2019-12-09 16:26:19.419903 I | etcdserver: starting server... [version: 2.2.1, cluster version: 2.2]
2019-12-09 16:26:19.422255 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[85558b08fd6377a2]=138e61746ab70219, local=362b3eb57d5b3247)
2019-12-09 16:26:19.422279 E | rafthttp: failed to dial 85558b08fd6377a2 on stream Message (cluster ID mismatch)
2019-12-09 16:26:19.422552 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[85558b08fd6377a2]=138e61746ab70219, local=362b3eb57d5b3247)
2019-12-09 16:26:19.422565 E | rafthttp: failed to dial 85558b08fd6377a2 on stream MsgApp v2 (cluster ID mismatch)
I1209 16:26:19.439578 18726 peers.go:101] found new candidate peer from discovery: etcd-a [{10.250.17.141 0} {10.250.17.141 0}]
I1209 16:26:19.439616 18726 peers.go:101] found new candidate peer from discovery: etcd-b [{10.250.16.215 0} {10.250.16.215 0}]
I1209 16:26:19.439629 18726 peers.go:101] found new candidate peer from discovery: etcd-c [{10.250.18.173 0} {10.250.18.173 0}]
I1209 16:26:19.439703 18726 peers.go:281] connecting to peer "etcd-a" with TLS policy, servername="etcd-manager-server-etcd-a"
I1209 16:26:19.439733 18726 hosts.go:84] hosts update: primary=map[], fallbacks=map[etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com:[10.250.17.141 10.250.17.141] etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com:[10.250.16.215 10.250.16.215] etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com:[10.250.18.173 10.250.18.173]], final=map[10.250.16.215:[etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com etcd-b.internal.kube.us-east-1.dev.deploys.brightcove.com] 10.250.17.141:[etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com etcd-a.internal.kube.us-east-1.dev.deploys.brightcove.com] 10.250.18.173:[etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com etcd-c.internal.kube.us-east-1.dev.deploys.brightcove.com]]
I1209 16:26:19.439903 18726 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
I1209 16:26:19.439982 18726 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W1209 16:26:19.440686 18726 peers.go:325] unable to grpc-ping discovered peer 10.250.18.173:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.250.18.173:3996: connect: connection refused"
I1209 16:26:19.440719 18726 peers.go:347] was not able to connect to peer etcd-c: map[10.250.18.173:3996:true]
W1209 16:26:19.440745 18726 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
W1209 16:26:19.441043 18726 peers.go:325] unable to grpc-ping discovered peer 10.250.17.141:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.250.17.141:3996: connect: connection refused"
I1209 16:26:19.441077 18726 peers.go:347] was not able to connect to peer etcd-a: map[10.250.17.141:3996:true]
W1209 16:26:19.441096 18726 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-a
2019-12-09 16:26:19.520566 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[21c1cba54be22c9a]=138e61746ab70219, local=362b3eb57d5b3247)
As the first of 3 hosts that would eventually have etcd-manager installed, the gossip-specific warnings are to be expected. The cluster ID mismatch errors are more significant, and are the consequence of etcd mounting a volume that was several months old.
Some of the approaches that occur to me to address this are:
Ordering matching etcd volumes by creation date and selecting the most recent by default
Log a warning message when multiple volumes are found matching etcd-manager's search for what to mount, but continue to start etcd as we do now
Log a warning message when multiple volumes are found matching etcd-manager's search for what to mount, prevent etcd from starting
I think the first would suffice in most cases, but the other options result in less unexpected behavior in the future.
The text was updated successfully, but these errors were encountered:
We've been using kops for a few years, and prior to the introduction of etcd-manager we relied on our own EBS backup strategy. This led to a number of etcd volumes being present in our AWS account that matched the tags used by etcd-manager to select and mount storage.
The first host that came up after the rolling-update that installed etcd-manager had the following in its logs:
As the first of 3 hosts that would eventually have etcd-manager installed, the gossip-specific warnings are to be expected. The cluster ID mismatch errors are more significant, and are the consequence of etcd mounting a volume that was several months old.
Some of the approaches that occur to me to address this are:
I think the first would suffice in most cases, but the other options result in less unexpected behavior in the future.
The text was updated successfully, but these errors were encountered: