ETCD TLS Bad Certificate #335

bencodner · 2020-07-30T20:43:39Z

I have had this issue a few times now and trying to understand what keeps causing it. I was previously running 1.13 but did a fresh install in our dev environment upgrading to v1.17.4 and everything has been running great until today.

KOPS:
Version 1.17.0-beta.1 (git-32af4ed9b)
----------------------------------------------------------------------------------------------------------------------
KUBECTL:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

----------------------------------------------------------------------------------------------------------------------
Image: kopeio/etcd-manager:3.0.20200116

I0730 16:38:32.187889    2613 controller.go:173] starting controller iteration
I0730 16:38:32.187922    2613 controller.go:269] I am leader with token "BxoWU5-sK8fgJ6ojmkfx-A"
2020-07-30 16:38:32.193214 I | embed: rejected connection from "10.10.10.95:38972" (error "remote error: tls: bad certificate", ServerName "etcd-events-a.internal.k8s-west2.redacted.net")
2020-07-30 16:38:33.193650 I | embed: rejected connection from "10.10.10.95:38974" (error "remote error: tls: bad certificate", ServerName "etcd-events-a.internal.k8s-west2.redacted.net")
2020-07-30 16:38:34.930768 I | embed: rejected connection from "10.10.10.95:38978" (error "remote error: tls: bad certificate", ServerName "etcd-events-a.internal.k8s-west2.redacted.net")
W0730 16:38:37.188592    2613 controller.go:675] unable to reach member etcdClusterPeerInfo{peer=peer{id:"etcd-events-a" endpoints:"10.10.10.95:3997" }, info=cluster_name:"etcd-events" node_configuration:<name:"etcd-events-a" peer_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:2381" client_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:4002" quarantined_client_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:3995" > etcd_state:<cluster:<desired_cluster_size:1 cluster_token:"etcd-cluster-token-etcd-events" nodes:<name:"etcd-events-a" peer_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:2381" client_urls:"https://0.0.0.0:4002" quarantined_client_urls:"http://0.0.0.0:3995" tls_enabled:true > > etcd_version:"3.3.13" > }: error building etcd client for https://etcd-events-a.internal.k8s-west2.redacted.net:4002: context deadline exceeded
I0730 16:38:37.188691    2613 controller.go:276] etcd cluster state: etcdClusterState
  members:
  peers:
    etcdClusterPeerInfo{peer=peer{id:"etcd-events-a" endpoints:"10.10.10.95:3997" }, info=cluster_name:"etcd-events" node_configuration:<name:"etcd-events-a" peer_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:2381" client_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:4002" quarantined_client_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:3995" > etcd_state:<cluster:<desired_cluster_size:1 cluster_token:"etcd-cluster-token-etcd-events" nodes:<name:"etcd-events-a" peer_urls:"https://etcd-events-a.internal.k8s-west2.redacted.net:2381" client_urls:"https://0.0.0.0:4002" quarantined_client_urls:"http://0.0.0.0:3995" tls_enabled:true > > etcd_version:"3.3.13" > }
I0730 16:38:37.188727    2613 controller.go:277] etcd cluster members: map[]
I0730 16:38:37.188742    2613 controller.go:615] sending member map to all peers: members:<name:"etcd-events-a" dns:"etcd-events-a.internal.k8s-west2.redacted.net" addresses:"10.10.10.95" > 
I0730 16:38:37.189042    2613 etcdserver.go:226] updating hosts: map[10.10.10.95:[etcd-events-a.internal.k8s-west2.redacted.net]]
I0730 16:38:37.189068    2613 hosts.go:84] hosts update: primary=map[10.10.10.95:[etcd-events-a.internal.k8s-west2.redacted.net]], fallbacks=map[etcd-events-a.internal.k8s-west2.redacted.net:[10.10.10.95 10.10.10.95]], final=map[10.10.10.95:[etcd-events-a.internal.k8s-west2.redacted.net]]
I0730 16:38:37.204747    2613 commands.go:22] not refreshing commands - TTL not hit
I0730 16:38:37.204774    2613 s3fs.go:220] Reading file "s3://k8s-west2.redacted.net-kops-store/k8s-west2.redacted.net/backups/etcd/events/control/etcd-cluster-created"
I0730 16:38:37.240942    2613 controller.go:369] spec member_count:1 etcd_version:"3.3.13" 
I0730 16:38:37.241108    2613 commands.go:25] refreshing commands
I0730 16:38:37.340952    2613 vfs.go:104] listed commands in s3://k8s-west2.redacted.net-kops-store/k8s-west2.redacted.net/backups/etcd/events/control: 0 commands
I0730 16:38:37.340985    2613 s3fs.go:220] Reading file "s3://k8s-west2.redacted.net-kops-store/k8s-west2.redacted.net/backups/etcd/events/control/etcd-cluster-spec"
W0730 16:38:37.353014    2613 controller.go:149] unexpected error running etcd cluster reconciliation loop: etcd has 0 members registered; must issue restore-backup command to proceed
----------------------------------------------------------------------------------------------------------------------

EVENTS POD: 
Name:                 etcd-manager-events-ip-10-10-10-95.us-west-2.compute.internal
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ip-10-10-10-95.us-west-2.compute.internal/10.10.10.95
Start Time:           Sun, 19 Apr 2020 11:49:47 -0600
Labels:               k8s-app=etcd-manager-events
Annotations:          kubernetes.io/config.hash: 8462a5a3729a329407ca0e6b37444ad5
                      kubernetes.io/config.mirror: 8462a5a3729a329407ca0e6b37444ad5
                      kubernetes.io/config.seen: 2020-04-19T17:49:46.926014141Z
                      kubernetes.io/config.source: file
                      scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Running
IP:                   10.10.10.95
IPs:
  IP:           10.10.10.95
Controlled By:  Node/ip-10-10-10-95.us-west-2.compute.internal
Containers:
  etcd-manager:
    Container ID:  docker://b99646659d82a3f2845e3aaebb6b3b6730c406badfb5668fae732f37a8fca5a2
    Image:         kopeio/etcd-manager:3.0.20200116
    Image ID:      docker-pullable://kopeio/etcd-manager@sha256:eb72d0a120059598446e4ed45781e40ff79a3bbcaa5861ea1b3d0e72a6654af5
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      mkfifo /tmp/pipe; (tee -a /var/log/etcd.log < /tmp/pipe & ) ; exec /etcd-manager --backup-store=s3://k8s-west2.redacted.net-kops-store/k8s-west2.redacted.net/backups/etcd/events --client-urls=https://__name__:4002 --cluster-name=etcd-events --containerized=true --dns-suffix=.internal.k8s-west2.redacted.net --etcd-insecure=false --grpc-port=3997 --insecure=false --peer-urls=https://__name__:2381 --quarantine-client-urls=https://__name__:3995 --v=6 --volume-name-tag=k8s.io/etcd/events --volume-provider=aws --volume-tag=k8s.io/etcd/events --volume-tag=k8s.io/role/master=1 --volume-tag=kubernetes.io/cluster/k8s-west2.redacted.net=owned > /tmp/pipe 2>&1
    State:          Running
      Started:      Sun, 19 Apr 2020 11:47:56 -0600
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        200m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /etc/hosts from hosts (rw)
      /etc/kubernetes/pki/etcd-manager from pki (rw)
      /rootfs from rootfs (rw)
      /var/log/etcd.log from varlogetcd (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  rootfs:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
  hosts:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/hosts
    HostPathType:  File
  pki:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki/etcd-manager-events
    HostPathType:  DirectoryOrCreate
  varlogetcd:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/etcd-events.log
    HostPathType:  FileOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute
                   CriticalAddonsOnly
Events:            <none>

----------------------------------------------------------------------------------------------------------------------

Cluster Validation: 
Validating cluster k8s-west2.redacted.net


VALIDATION ERRORS
KIND    NAME  MESSAGE
ComponentStatus etcd-0  component "etcd-0" is unhealthy
ComponentStatus etcd-1  component "etcd-1" is unhealthy

Validation Failed

----------------------------------------------------------------------------------------------------------------------
Host certs:
for i in $(ls /etc/kubernetes/pki/etcd-manager-events | grep crt);do openssl x509 -enddate -noout -in $i;done
notAfter=Jul 26 20:31:10 2029 GMT
notAfter=Mar 26 17:03:49 2029 GMT
notAfter=Apr 19 17:49:00 2021 GMT
notAfter=Apr 19 17:49:00 2021 GMT
notAfter=Jul 26 20:31:10 2029 GMT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD TLS Bad Certificate #335

ETCD TLS Bad Certificate #335

bencodner commented Jul 30, 2020 •

edited

Loading

ETCD TLS Bad Certificate #335

ETCD TLS Bad Certificate #335

Comments

bencodner commented Jul 30, 2020 • edited Loading

bencodner commented Jul 30, 2020 •

edited

Loading