Upgrade to kubernetes 1.11 #473

SpComb · 2018-07-04T12:51:44Z

Fixes #419
Fixes #472

Changes

Install kubeadm binary separately for master upgrades, to avoid breaking the kubelet systemd unit dropins
Switch from kube-dns to coredns
Switch to the new kubeadm kubeadm.k8s.io/v1alpha2 config
Remove kubelet --cluster-dns configuration now that kubeadm handles it via the kubelet config
Upgrade to etcd 3.2

TODO

SpComb · 2018-07-04T13:06:32Z

kubeadm upgrade now requires cri-tools:

    $ sudo VERSION=1.11.0 ARCH=amd64 sh -x < upgrade-kubeadm.sh
    +     set     -ex    
    +     kubeadm     version     -o     short    
    + [ v1.10.4 = v1.11.0 ]
    + cd /tmp
    + export DEBIAN_FRONTEND=noninteractive
    + apt-get download kubeadm=1.11.0-00
    Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.11.0-00 [9,422 kB]
    Fetched 9,422 kB in 1s (6,268 kB/s)
    + dpkg -i --ignore-depends=kubelet kubeadm_1.11.0-00_amd64.deb
    (Reading database ... 83419 files and directories currently installed.)
    Preparing to unpack kubeadm_1.11.0-00_amd64.deb ...
    Unpacking kubeadm (1.11.0-00) over (1.10.4-00) ...
    dpkg: dependency problems prevent configuration of kubeadm:
     kubeadm depends on cri-tools (>= 1.11.0); however:
      Package cri-tools is not installed.
    
    dpkg: error processing package kubeadm (--install):
     dependency problems - leaving unconfigured
    Errors were encountered while processing:
     kubeadm
    ! 1

SpComb · 2018-07-04T13:15:03Z

Upgrading the system kubeadm package to run the kubeadm upgrade breaks the kubelet, because the new kubeadm package updates the /etc/systemd/system/kubelet.service.d/10-kubeadm.conf file to run the kubelet with --config=/var/lib/kubelet/config.yaml, but the kubeadm package upgrade does not write out this config, so it gets stuck with the kubelet in a restart loop on the missing config.

This is a documented limitation for the kubeadm upgrade path: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-11/#upgrade-the-control-plane

Note that upgrading the kubeadm package on your system prior to upgrading the control plane causes a failed upgrade. Even though kubeadm ships in the Kubernetes repositories, it’s important to install it manually. The kubeadm team is working on fixing this limitation.

The kubeadm upgrade will update the kubelet config on the master node at the end, so no need for a separate kubeadm upgrade node config for the master node AFAICT

    [apiclient] Found 1 Pods for label selector component=kube-apiserver
    [upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
    [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2018-07-04-13-58-03/kube-controller-manager.yaml"
    [upgrade/staticpods] Waiting for the kubelet to restart the component
    Static pod: kube-controller-manager-terom-pharos-master hash: 6c1e40c591159ead0cb40bfed474e0f3
    Static pod: kube-controller-manager-terom-pharos-master hash: 01e7100e5550f41e40cc34eb55f9a7fa
    [apiclient] Found 1 Pods for label selector component=kube-controller-manager
    [upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
    [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2018-07-04-13-58-03/kube-scheduler.yaml"
    [upgrade/staticpods] Waiting for the kubelet to restart the component
    Static pod: kube-scheduler-terom-pharos-master hash: 2ec65d6c3ad7f10608bdfd93016abe03
    Static pod: kube-scheduler-terom-pharos-master hash: 31eabaff7d89a40d8f7e05dfc971cdbd
    [apiclient] Found 1 Pods for label selector component=kube-scheduler
    [upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
    [uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
    [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
    [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "terom-pharos-master" as an annotation
    [bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    [bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    [bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    [addons] Applied essential addon: CoreDNS
    [addons] Applied essential addon: kube-proxy
    
    [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.11.0". Enjoy!
    
    [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
    ! 0

…eadm package

SpComb · 2018-07-04T14:32:50Z

Initial install and upgrade from pharos 1.2 should work now, tested with custom hacks to use upstream repos instead of the pharos ones.

SpComb · 2018-07-05T06:19:51Z

Further testing/work pending on updated pharos images/packages.

SpComb · 2018-07-13T09:07:42Z

kubeadm reset now prompts:

+ kubeadm reset
[reset] WARNING: changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] are you sure you want to proceed? [y/N]: Aborted reset operation
/app/lib/pharos/ssh/remote_command.rb:66:in `run!'
/app/lib/pharos/ssh/client.rb:60:in `exec!'
/app/lib/pharos/ssh/client.rb:72:in `exec_script!'
/app/lib/pharos/host/configurer.rb:74:in `exec_script'
/app/lib/pharos/host/el7/el7.rb:82:in `reset'
/app/lib/pharos/phases/reset_host.rb:10:in `call'
/app/lib/pharos/phase_manager.rb:70:in `block in apply'
/app/lib/pharos/phase_manager.rb:26:in `block (2 levels) in run_parallel'

SpComb · 2018-07-13T09:16:05Z

lib/pharos/host/el7/scripts/upgrade-kubeadm.sh

+gpg --verify /tmp/kubeadm.gz.asc /tmp/kubeadm.gz
+gunzip /tmp/kubeadm.gz
+install -o root -g root -m 0755 -t /usr/local/bin /tmp/kubeadm # XXX: overrides package version?


This should be installed somewhere temporarily only for the duration of the upgrade, and removed once the kubeadm package has been upgraded... leaving behind a version of kubeadm in /usr/local/bin is bad.

This also gets left behind on pharos-cluster reset and causes the next pharos-cluster up to run using the wrong version of kubeadm.

SpComb · 2018-07-13T09:25:34Z

pharos-cluster reset also does not remove the cri-tools package, leaving crictl installed and breaking kubeadm join on kube 1.10 for the default docker CRI per kubernetes/kubeadm#657

SpComb · 2018-07-13T09:29:27Z

With kubeadm now depending on the separate cri-tools package, our own cri-o package should no longer be including /usr/local/bin/crictl?

SpComb · 2018-07-25T13:35:24Z

Upgrading a CentOS 7 worker node from 1.10 -> 1.11 with kubeadm upgrade node config leaves the kubelet configured with the default cgroupDriver: cgroupfs:

Jul 25 11:36:10 terom-centos-test kubelet[5939]: F0725 11:36:10.541309    5939 server.go:262] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Per https://kubernetes.io/docs/setup/independent/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node kubeadm should be detecting it and configuring it automatically, but this isn't happening:

When using Docker, kubeadm will automatically detect the cgroup driver for the kubelet and set it in the /var/lib/kubelet/kubeadm-flags.env file during runtime.

[root@terom-centos-test ~]# cat /var/lib/kubelet/kubeadm-flags.env
cat: /var/lib/kubelet/kubeadm-flags.env: No such file or directory
[root@terom-centos-test ~]# grep cgroupDriver /var/lib/kubelet/config.yaml
cgroupDriver: cgroupfs

EDIT: seems like this might be a kubeadm upgrade node config bug - after a pharos-cluster reset and fresh kubeadm join, the cgroup-driver configuration is different:

[root@terom-centos-test ~]# cat /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS=--cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni

Workaround is to have pharos-cluster itself configure --cgroup-driver=systemd via the systemd unit KUBELET_EXTRA_ARGS.

SpComb · 2018-07-25T15:25:27Z

Oops, the default EnvironmentFile=./etc/sysconfig/kubelet|/etc/default/kubelet with KUBELET_EXTRA_ARGS= now overrides the Environemnt="KUBELET_EXTRA_ARGS=..." in the system dropin.

…luster into feature/kubernetes-1.11

jakolehm · 2018-08-03T11:46:14Z

CoreDNS being a multiarch image is causing headaches. First I tried to build coredns on separate repos for each architecture (like every other k8s image) but kubeadm wants to use multiarch -> breaks upgrade because image pull does not work.

Then I tried to actually create a multiarch repo (with https://github.com/estesp/manifest-tool) but it seems that quay.io does not support v2.2 manifests (they are currently implementing it, see: moby/buildkit#409 (comment) .

jakolehm · 2018-08-06T06:35:09Z

kubeadm wants to use multiarch -> breaks upgrade because image pull does not work.

Not completely sure if this is actually true. Based on logs it seems that CoreDNS addon is applied but then upgrade hangs on when it tries to remove kube-dns deployment.

    [bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    [bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    [bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    [addons] Applied essential addon: CoreDNS
... HANGS ...

jakolehm · 2018-08-06T06:39:29Z

Reason why it halts there is that kubeadm waits for coredns replicas... this cannot happen because if invalid image name: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L141-L149

jakolehm · 2018-08-06T14:30:58Z

lib/pharos/phases/configure_dns.rb

@@ -6,7 +6,8 @@ class ConfigureDNS < Pharos::Phase
      title "Configure DNS"

      def call
-        patch_kubedns(
+        patch_deployment(
+          'coredns',


Does this kube-dns -> coredns change cause downtime when upgrading from Pharos 1.2?

Nope, it should be smooth ride (kube-dns deployment is removed after coredns is running).

jakolehm · 2018-08-06T14:31:16Z

lib/pharos/phases/configure_dns.rb

-                      }
-                    ]
-                  }
+        Pharos::Kube.session(@master.api_address).resource_client('apps/v1').patch_deployment(


Patch should be ok here, right?

Yes, the PATCH matches what kubectl set image does:

$ kubectl -v8 -n kube-system set image deployments/coredns coredns=quay.io/kontena/coredns-amd64:1.1.3 ... I0807 11:11:05.700941 8436 request.go:874] Request Body: {"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"coredns"}],"containers":[{"image":"quay.io/kontena/coredns-amd64:1.1.3","name":"coredns"}]}}}} I0807 11:11:05.700985 8436 round_trippers.go:383] PATCH https://167.99.39.233:6443/apis/extensions/v1beta1/namespaces/kube-system/deployments/coredns ...

Not entirely sure what the $setElementOrder is doing, but it seems unnecessary. Apart from that the container image part is identical.

Just a note that this also changes how dns replicas/maxSurge/etc is sent.

Yes, those are simple object-level PATCHes, no array merge needed. Verified that the end result is as intended.

jakolehm · 2018-08-07T07:22:55Z

@SpComb I think we should merge this and fix remaining issues (if any) in separate pr's.

SpComb · 2018-08-07T09:01:32Z

lib/pharos/phases/configure_dns.rb

+                              {
+                                key: "k8s-app",
+                                operator: "In",
+                                values: [name]


The podAntiAffinity is broken, the pods get scheduled on the same node:

NAME READY STATUS RESTARTS AGE IP NODE coredns-5c7c9977c-7dww4 1/1 Running 0 4m 10.32.0.216 terom-pharos-master coredns-5c7c9977c-pn5k4 1/1 Running 0 4m 10.32.0.215 terom-pharos-master

The coredns deployment uses a k8s-app: kube-dns label, so this shouldn't be coredns.

Fixed

NAME READY STATUS RESTARTS AGE IP NODEt coredns-dcb4c7ddd-pn96m 1/1 Running 0 1m 10.32.1.8 terom-xenial-test coredns-dcb4c7ddd-tk2dm 1/1 Running 0 1m 10.32.2.29 terom-bionic-test

SpComb · 2018-08-07T13:10:59Z

Quick smoketest for this is passing, so this should be good enough to merge and continue testing in master together with other changes.

CI needs #504 to pass.

The CoreDNS image hack is unfortunate, but I don't know how.

Tero Marttila added 10 commits July 4, 2018 12:55

ConfigureKubelet: invert --read-only-port for new default

8c7b9b6

remove KUBE_DNS_ARGS, handled by kubeadm

9d8dd2a

fix ConfigureDNS to patch coredns deployment

88e5963

fix specs

725058d

fix etcd version check

5eb4980

update kube version to 1.11

1cf8599

update etcd to 3.2

b6c5d0b

log kubeadm config at debug level

1fe6a34

update kubeadm config for 1.11

41f1f97

remove broken etcd_healthy check

b636d87

SpComb added the enhancement New feature or request label Jul 4, 2018

SpComb added this to the 1.3.0 milestone Jul 4, 2018

Tero Marttila added 2 commits July 4, 2018 17:07

change upgrade-kubeadm to download binary without updating system kub…

4c8b329

…eadm package

add 1.3 worker migration for running kubeadm upgrade node config

52b41c1

Tero Marttila added 2 commits July 4, 2018 17:36

fix kubeadm config spec

3398aea

rubofix

74a3f5c

SpComb commented Jul 13, 2018

View reviewed changes

Tero Marttila added 2 commits July 25, 2018 13:57

Merge branch 'master' into feature/kubernetes-1.11

1c738b0

install kubeadm for upgrades at /usr/local/bin/pharos-kubeadm-*

a17646b

HostConfigurer#kubelet_args for centos --cgroup-driver

16a3ba8

workaround systemd unit EnvironmentFile= KUBELET_EXTRA_ARGS override

3b2ca3e

Tero Marttila and others added 8 commits July 30, 2018 11:07

refactor configure_internal_etcd

e2e0679

rubofix

5852fc8

fix missing ETCD_VERSION=3.2.18

afeddca

Merge branch 'master' into feature/kubernetes-1.11

28cb028

cri-o 1.11.1

da369bb

Merge branch 'feature/kubernetes-1.11' of github.com:kontena/pharos-c…

488a074

…luster into feature/kubernetes-1.11

patch coredns image

61157a4

coredns version

1fd400f

jakolehm added 5 commits August 6, 2018 11:44

add gpg key in upgrade-kubeadm

9cd4b66

fix configure dns spec

2849378

more specs

704b384

kubernetes 1.11.1

23d83b7

rescue only KubeException

02150f9

jakolehm reviewed Aug 6, 2018

View reviewed changes

specs

467bc7a

SpComb commented Aug 7, 2018

View reviewed changes

Tero Marttila and others added 3 commits August 7, 2018 12:11

fix coredns podAntiAffinity to use old k8s-app=kube-dns label

75e1463

rescue StandardError (fixes openssl errors leaking)

c682ce5

reset: cleanup pharos-kubeadm-* used for ugprades

dcdbda1

jakolehm approved these changes Aug 7, 2018

View reviewed changes

jakolehm changed the title ~~[WiP] Upgrade to kubernetes 1.11~~ Upgrade to kubernetes 1.11 Aug 7, 2018

jakolehm merged commit 71bb4de into master Aug 7, 2018

jakolehm deleted the feature/kubernetes-1.11 branch August 7, 2018 13:23

jakolehm mentioned this pull request Aug 7, 2018

Release v1.3.0-beta.1 #507

Merged

jakolehm mentioned this pull request Aug 30, 2018

Release v1.3.0 #579

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to kubernetes 1.11 #473

Upgrade to kubernetes 1.11 #473

SpComb commented Jul 4, 2018 •

edited by jakolehm

Loading

SpComb commented Jul 4, 2018

SpComb commented Jul 4, 2018 •

edited

Loading

SpComb commented Jul 4, 2018

SpComb commented Jul 5, 2018

SpComb commented Jul 13, 2018

SpComb Jul 13, 2018 •

edited

Loading

SpComb Jul 13, 2018

SpComb commented Jul 13, 2018 •

edited

Loading

SpComb commented Jul 13, 2018

SpComb commented Jul 25, 2018 •

edited

Loading

SpComb commented Jul 25, 2018

jakolehm commented Aug 3, 2018

jakolehm commented Aug 6, 2018

jakolehm commented Aug 6, 2018

jakolehm Aug 6, 2018

jakolehm Aug 6, 2018

SpComb Aug 7, 2018

jakolehm Aug 7, 2018

SpComb Aug 7, 2018

jakolehm commented Aug 7, 2018

SpComb Aug 7, 2018 •

edited

Loading

SpComb Aug 7, 2018

SpComb commented Aug 7, 2018

Upgrade to kubernetes 1.11 #473

Upgrade to kubernetes 1.11 #473

Conversation

SpComb commented Jul 4, 2018 • edited by jakolehm Loading

Changes

TODO

SpComb commented Jul 4, 2018

SpComb commented Jul 4, 2018 • edited Loading

SpComb commented Jul 4, 2018

SpComb commented Jul 5, 2018

SpComb commented Jul 13, 2018

SpComb Jul 13, 2018 • edited Loading

Choose a reason for hiding this comment

SpComb Jul 13, 2018

Choose a reason for hiding this comment

SpComb commented Jul 13, 2018 • edited Loading

SpComb commented Jul 13, 2018

SpComb commented Jul 25, 2018 • edited Loading

SpComb commented Jul 25, 2018

jakolehm commented Aug 3, 2018

jakolehm commented Aug 6, 2018

jakolehm commented Aug 6, 2018

jakolehm Aug 6, 2018

Choose a reason for hiding this comment

jakolehm Aug 6, 2018

Choose a reason for hiding this comment

SpComb Aug 7, 2018

Choose a reason for hiding this comment

jakolehm Aug 7, 2018

Choose a reason for hiding this comment

SpComb Aug 7, 2018

Choose a reason for hiding this comment

jakolehm commented Aug 7, 2018

SpComb Aug 7, 2018 • edited Loading

Choose a reason for hiding this comment

SpComb Aug 7, 2018

Choose a reason for hiding this comment

SpComb commented Aug 7, 2018

SpComb commented Jul 4, 2018 •

edited by jakolehm

Loading

SpComb commented Jul 4, 2018 •

edited

Loading

SpComb Jul 13, 2018 •

edited

Loading

SpComb commented Jul 13, 2018 •

edited

Loading

SpComb commented Jul 25, 2018 •

edited

Loading

SpComb Aug 7, 2018 •

edited

Loading