Network setup: Investigate (from recent openwrt release notes)
- RFC 7788 - Home Networking Control Protocol
- RFC 7084 - Basic Requirements for IPv6 Customer Edge Routers
See also https://typhoon.psdn.io/bare-metal/ and https://github.com/kubermesh/kubermesh
Consider https://github.com/coreos/torcx for storing local “patches” to reapply to an otherwise-immutable base image.
Configure host sshd to accept certs signed by k8s CA. Do short-term signatures, just for individual debugging exercises (12h?)
TODO:
- move self-hosted control plane to (separate) kubecfg repo, based on https://github.com/kubernetes-incubator/bootkube/blob/master/pkg/asset/internal/templates.go
- make an easy self-host installer by generating fake initial checkpoints
- test/document/profit/etc
- enable CONFIG_KSM on containos
- enable hugetlb (and hugetlbfs, etc) on containos
NB: local DNS preconfigured with `kube.lan` -> 192.168.0.9
ip link add anycast0 type dummy || : ip addr replace 192.168.0.9/32 dev anycast0
kubeadm init \ –node-name=$(cat /etc/machine-id) \ –pod-network-cidr=10.244.0.0/16 \ –apiserver-cert-extra-sans=kube.lan,kube.oldmacdonald.farm \ –apiserver-advertise-address=192.168.0.9 –token-ttl=12h \ –feature-gates=SelfHosting=true
Need to hack in other ssh session (fixed upstream maybe?): sed -i ‘s/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 180/’ /etc/kubernetes/manifests/kube-apiserver.yaml
Go to k8s config section.
Go back to kubeadm 1.12.x (yes, that’s right - this was the last version before they required reading the ‘kube-proxy’ configmap as part of node join, via the ability to read all kube-system configmaps as node bootstrapper)
#RELEASE=”$(curl -sSL https://dl.k8s.io/release/stable.txt)” RELEASE=v1.12.5 curl -L –remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/amd64/kubeadm
Get new token: (with cluster-admin, this works with current kubeadm) `KUBECONFIG=$HOME/.kube/homeconf kubeadm token create –ttl 12h –print-join-command` (Note! kubeadm interprets key/cert relative paths incorrectly, so needs to be run from ~/.kube/)
kubeadm join \ –node-name=$(cat /etc/machine-id) \ –discovery-token-ca-cert-hash=sha256:$certhash \ –token $token kube.lan:6443 \ –ignore-preflight-errors=all
Can flash leds to identify a physical machine: `cat /sys/class/leds/bananapi:green:usr/trigger` shows values. echo heartbeat > /sys/class/leds/bananapi:green:usr/trigger echo none > /sys/class/leds/bananapi:green:usr/trigger
PXE boot into flatcar (on ramdisk). wget http://kube.lan:31069/pxe-config.ign sudo flatcar-install -d /dev/sda -C beta -i pxe-config.ign reboot
docker run –rm -it \ -v /etc:/rootfs/etc \ -v /opt:/rootfs/opt \ -v /usr/bin:/rootfs/usr/bin \ -e K8S_VERSION=v1.7.7 \ xakra/kubeadm-installer coreos
PATH=$PATH:/opt/bin sudo kubeadm join \ –node-name=$(cat /etc/machine-id) \ –discovery-token-ca-cert-hash=sha256:$certhash \ –token $token kube.lan:6443
Edit daemonset/self-hosted-kube-apiserver to set `–etcd-quorum-read=true` (quorum=false should not exist, grumble, grumble)
scp [email protected]:/etc/kubernetes/admin.conf /tmp/kubeconfig ./push.sh
(on master)
Will boot up, run etcd (from /e/k/manifests), and then sit.
kubeadm alpha phase controlplane all \ –pod-network-cidr=10.244.0.0/16 \ –apiserver-advertise-address=192.168.0.9
Wait for control plane to come up. Will read etcd and start up self-hosted control jobs. self-hosted jobs will crash-loop because addresses/locks are in use. When this is happening:
rm /etc/kubernetes/manifests/kube-{apiserver,controller-manager,scheduler}.yaml
Set up single (self-hosted) master using kubeadm as usual.
Join new (potential master) node as normal: kubeadm join \ –node-name=$(cat /etc/machine-id) \ –discovery-token-ca-cert-hash=sha256:$hash \ –token $token kube.lan:6443
kubectl taint node $node node-role.kubernetes.io/master=:NoSchedule kubectl label node $node node-role.kubernetes.io/master=
Set up CA cert, and signed server+peer certs for (at least) existing and new etcd node, and client certs for apiserver. NB: existing (kubeadm) server will have etcd name “default”.
On existing (kubeadm) master:
docker run –net=host –rm -e ETCDCTL_API=3 -ti \ gcr.io/google_containers/etcd-arm:3.1.10 /bin/sh etcdctl member list etcdctl member update $memberID https://$ip:2380
Install certs and modify /etc/kubernetes/manifests/etcd.yaml to add: env:
- name: POD_IP valueFrom: fieldRef: fieldPath: status.hostIP
command:
- –advertise-client-urls=https://$(POD_IP):2379
- –listen-client-urls=http://127.0.0.1:2379,https://$(POD_IP):2379
- –cert-file=/keys/etcd-kmaster1-server.pem
- –key-file=/keys/etcd-kmaster1-server-key.pem
- –peer-cert-file=/keys/etcd-kmaster1-peer.pem
- –peer-key-file=/keys/etcd-kmaster1-peer-key.pem
- –peer-client-cert-auth
- –peer-trusted-ca-file=/keys/etcd-ca-peer.pem
- –listen-peer-urls=https://$(POD_IP):2380
volumeMounts:
- mountPath: /keys name: keys
volumes:
- hostPath: path: /etc/kubernetes/pki type: Directory name: keys
Copy etcd TLS keys into etc/kubernetes/pki
Copy manifests/etcd.yaml to new node, modify ETCD_NAME and key paths. (will crashloop until next step)
On existing master: docker run –net=host -e ETCDCTL_API=3 –rm -ti \ gcr.io/google_containers/etcd-arm:3.1.10 \ etcdctl member add kmaster2 –peer-urls=https://192.168.0.140:2380
On new (empty) additional master:
Copy /etc/kubernetes/pki/ca.key over to new machine(s)
ETCD_NAME=kmaster3; POD_IP=192.168.0.128; docker run –rm –net=host -v /var/lib/etcd:/var/lib/etcd -v /etc/kubernetes/pki:/keys gcr.io/google_containers/etcd-arm:3.0.17 etcd –advertise-client-urls=https://${POD_IP}:2379 –data-dir=/var/lib/etcd –listen-client-urls=http://127.0.0.1:2379,https://${POD_IP}:2379 –initial-cluster=default=https://192.168.0.9:2380,${ETCD_NAME}=https://${POD_IP}:2380 –initial-advertise-peer-urls=https://${POD_IP}:2380 –initial-cluster-state=existing –cert-file=/keys/etcd-${ETCD_NAME}-server.pem –key-file=/keys/etcd-${ETCD_NAME}-server-key.pem –peer-cert-file=/keys/etcd-${ETCD_NAME}-peer.pem –peer-key-file=/keys/etcd-${ETCD_NAME}-peer-key.pem –peer-client-cert-auth –peer-trusted-ca-file=/keys/etcd-ca.pem –listen-peer-urls=https://${POD_IP}:2380 –client-cert-auth –trusted-ca-file=/keys/etcd-ca.pem –election-timeout=10000 –heartbeat-interval=1000
NB: remove old dead nodes before adding new nodes. See etcd FAQ for discussion.
Add to known members: kubectl -n kube-system edit configmap etcd kubectl -n kube-system exec $existing_etcd_pod – \ etcdctl member add $name –peer-urls=https://<pod ip>:2380
- Careful when changing! StatefulSets don’t (yet) support updateStrategy.minAvailable, so 1x failed + 1x updating can lead to 2 down.
- Delete the dead node, it isn’t coming back. kubectl delete node $node
- Remove the dead member, it isn’t coming back. kubectl -n kube-system exec $existing_etcd_pod – \ etcdctl member list kubectl -n kube-system exec $existing_etcd_pod – \ etcdctl member remove $memberid
- Promote learner/spare. kubectl -n kube-system exec $existing_etcd_pod – \ etcdctl member promote $learnerid
- Update etcd k8s config to add a new etcd learner.
- Add to etcdMembers. Push.
This updates certificate+secret, and nodeSelector. (good)
.. And also changes etcd command line, which leads to 2x etcd down 😱 Would a StatefulSet partition (of zero nodes) have helped here?
- Copy checkpointed manifest+secrets off the one remaining etcd, hack podspec to match mistakenly killed node, and use to bring back 2x working replicas. Will need to repeat occasionally, until checkpoints converge.
- etcdctl remove/add new member. etcd scheduled on new node, and (after etcdctl membership change) came up. This worked well.
- statefulset continued the semantic-noop update of remaining pods with the new initial-cluster flag value.
- Delete the dead node first. It’s not coming back, let everything else failover/restart as expected early in the process.
- Do the etcdctl membership remove/add first, while 2 nodes are up. Again, the dead node isn’t coming back.
- Try the partition thing. Basically want to update cert + expand nodeSelector but under no circumstances restart existing/healthy etcd peer.
Change hostnames to use symbolic etcd-[0-2] names? apiserver still needs to know IPs - or a hostname? initial cluster peers need IPs too. needs to match current state to avoid update. another hostname?
Aha! Use a learner as a ‘warm spare’. Run replicas=4, but one of the members is a ‘learner’ according to etcd. After failure, we can promote the learner to full member with etcdctl command (no k8s changes required), and get back to 3 nodes. Then can use regular k8s operations to replace node and schedule a new learner (ok, even if that replacement takes temporarily takes an etcd node down since we’re now back to 3 full members).
- Copy static manifests around from (hopefully) a remaining good node.
- Get etcd up. No point fixing anything else until this happens.
Regenerate static manifest: kubecfg show bootstrap.jsonnet -o json
d=/etc/kubernetes/checkpoint-secrets/kube-system/etcd-2/etcd-peer ETCDCTL_API=3 etcdctl \ –endpoints https://192.168.0.161:2379 \ –cacert=$d/ca.crt –key=$d/tls.key –cert=$d/tls.crt \ member list
kubeadm binaries available from https://dl.k8s.io/release/$release/bin/linux/$arch/kubeadm
NB: control jobs first, then kubelets Also: ensure to regenerate/rotate keys as part of upgrade - they have a 6month expiry.
stash kubeadm-arm-v1.9.10 locally in ipfs: ipfs add https://dl.k8s.io/release/v1.9.10/bin/linux/arm/kubeadm QmSdVUeRF5QkSDZAd4sNMoH7AYANpXa4J9ME3TMQu8tVgh
On a master: Fetch kubeadm binary to /var/lib ./kubeadm-v1.9.10 upgrade apply –feature-gates SelfHosting=true v1.9.10
- Upgrade etcd image to 3.1.11
kubeadm-arm-v1.10.12: QmSboULs6WEs9Q2R1HV21HRAWmbUNRkS9cvJMnRuvU5xfz
Remove (backup) {apiserver,apiserver-kubelet-client,front-proxy-client}.{crt,key} Regen (run make) Copy out to nodes scp {apiserver,apiserver-kubelet-client,front-proxy-client}.{crt,key} $node:/etc/kubernetes/pki/
note: CoreDNS replaces kube-dns as the default DNS provider
note: Clusters still using Heapster for autoscaling should be migrated over to metrics-server and the custom metrics API
note: kube-proxy IPVS (note graceful termination is in v1.13)
note: sysctl support is now considered beta
kubeadm-arm-v1.11.10: QmaZ5cnybq5jPdjGk4Anght9xSoch1ExFfm2JQu49NBDPw
- update jsonnet kube-system manifests
- ./push.sh
- yolo manual delete of kube-dns svc, bring up coredns svc, check, delete kube-dns deploy
- ./coreos-update-all.sh
Now that cert-manager is rotating cluster-admin certs, they can expire before I remember to grab an updated copy. Oops.
On an etcd node: sudo etcdctl get /registry/secrets/kube-system/kubernetes-admin \ –cacert=/etc/kubernetes/checkpoint-secrets/kube-system/etcd-0/etcd-monitor/ca.crt \ –cert=/etc/kubernetes/checkpoint-secrets/kube-system/etcd-0/etcd-monitor/tls.crt \ –key=/etc/kubernetes/checkpoint-secrets/kube-system/etcd-0/etcd-monitor/tls.key