Skip to content

Latest commit

 

History

History
763 lines (591 loc) · 27.2 KB

05_Installing_Kubernetes_Control_Plane.md

File metadata and controls

763 lines (591 loc) · 27.2 KB

[ macOS/ARM64 | Linux/AMD64 ]

Previous: Bootstrapping Kubernetes Security

Installing Kubernetes Control Plane

In this chapter we will install the Kubernetes control plane components, i.e. etcd, kube-apiserver, kube-scheduler and kube-controller-manager. As a result, for the first time we'll have a fully functioning Kubernetes API to talk to.

We will also set up a virtual IP based load balancer for the Kubernetes API on the gateway machine, making it possible to reach the API using a simple domain name kubernetes.kubenet (or just kubernetes).

On the way, we'll also learn/remind some basic Linux tools and concepts, e.g. systemd and IPVS.

Table of Contents generated with DocToc

Prerequisites

Make sure you have completed all the previous chapters, your VMs are running and have all the certificates, keys and kubeconfigs deployed.

Quick overview of systemd

Ubuntu uses systemd as the "init system", i.e. a software suite that manages services/daemons, starting them during system boot, making sure they run in the correct order, etc. We'll be using systemd throughout this chapter to run Kubernetes components. Because of that, let's have a quick theoretical introduction into systemd in order to make things less magic.

Unit files

In order to register a new service in the system and make it run on system boot, a unit file needs to be created, usually in the /etc/systemd/system directory. Typically, unit files are managed by a package manager like APT. However, since we are doing things the hard way, we will be writing them by hand.

A unit file has a type (corresponding to its file extension), that determines the type of entity it defines. In this guide, we are only interested in the .service type, which indicates a runnable service definition.

systemd also talks about targets, which are synchronization points, effectively used to define dependencies between units and force their initialization order.

A minimal service-type unit file could look like this:

[Unit]
Description=My custom service
After=network.target

[Service]
ExecStart=/usr/local/bin/myservice
Restart=always

[Install]
WantedBy=multi-user.target

which defines a service that requires the network target to be completed before running, and installs itself as a dependency of the multi-user target.

systemd is associated with a command line program, systemctl, which can be used to reload unit definitions, start, stop, restart, inspect services, etc.

Installing core components

Let's start installing control plane components. In order to do this simultaneously on all control nodes, you can use tmux with pane synchronization, as described elsewhere. Note that the way we have set up a tmuxsession with SSH connections to all VMs was designed specifically for that purpose.

Note

This guide suggests running all commands by hand (via tmux) so that you can see and verify every step. However, the guide repository also contains scripted version that you can reuse later.

Important

The guide assumes that all commands are run from default user (ubuntu) home directory, which contains all the uploaded certificates, keys and kubeconfigs.

Common variables

Let's define some reusable shell variables to use throughout this chapter:

arch=arm64

etcd_version=3.5.15
k8s_version=1.31.0

vmaddr=$(ip addr show enp0s1 | grep -Po 'inet \K192\.168\.1\.\d+')
vmname=$(hostname -s)

Installing etcd

Let's download the etcd binary, unpack it and copy into appropriate system directory:

etcd_archive=etcd-v${etcd_version}-linux-${arch}.tar.gz
wget -q --show-progress --https-only --timestamping \
  "https://github.com/etcd-io/etcd/releases/download/v${etcd_version}/$etcd_archive"
tar -xvf $etcd_archive
sudo cp etcd-v${etcd_version}-linux-${arch}/etcd* /usr/local/bin

Set up etcd data and configuration directories, then install all the necessary certificates and keys:

sudo mkdir -p /etc/etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd/
sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/

Create a systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/coreos

[Service]
Type=notify
Environment=ETCD_UNSUPPORTED_ARCH=${arch}
ExecStart=/usr/local/bin/etcd \\
  --name $vmname \\
  --cert-file=/etc/etcd/kubernetes.pem \\
  --key-file=/etc/etcd/kubernetes-key.pem \\
  --peer-cert-file=/etc/etcd/kubernetes.pem \\
  --peer-key-file=/etc/etcd/kubernetes-key.pem \\
  --trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-client-cert-auth \\
  --client-cert-auth \\
  --initial-advertise-peer-urls https://${vmaddr}:2380 \\
  --listen-peer-urls https://${vmaddr}:2380 \\
  --listen-client-urls https://${vmaddr}:2379,https://127.0.0.1:2379 \\
  --advertise-client-urls https://${vmaddr}:2379 \\
  --initial-cluster-token etcd-cluster-0 \\
  --initial-cluster control0=https://192.168.1.11:2380,control1=https://192.168.1.12:2380,control2=https://192.168.1.13:2380 \\
  --initial-cluster-state new \\
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

It's not worth explaining in detail all the options from the above file. The security related ones are a direct consequence of the security assumptions from the previous chapter. The other ones simply tell the etcd cluster how it should initialize itself. Exhaustive reference can be found here.

Reload systemd unit definitions and start etcd service:

sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd

Verify if the service is running:

systemctl status etcd.service

If something is wrong, you can look up logs:

journalctl -u etcd.service

You can also verify if the cluster is running properly by listing cluster memebers with the following command:

sudo ETCDCTL_API=3 etcdctl member list \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/ca.pem \
  --cert=/etc/etcd/kubernetes.pem \
  --key=/etc/etcd/kubernetes-key.pem

The output should look similar to this:

91bdf612a6839630, started, control0, https://192.168.1.11:2380, https://192.168.1.11:2379, false
bb39bdb8c49d4b1b, started, control2, https://192.168.1.13:2380, https://192.168.1.13:2379, false
dc0336cac5c58d30, started, control1, https://192.168.1.12:2380, https://192.168.1.12:2379, false

Installing kube-apiserver

Download the binary and copy it to /usr/local/bin:

wget -q --show-progress --https-only --timestamping \
  "https://storage.googleapis.com/kubernetes-release/release/v${k8s_version}/bin/linux/${arch}/kube-apiserver"
chmod +x kube-apiserver
sudo cp kube-apiserver /usr/local/bin

Create a configuration directory for kube-apiserver and copy all the necessary security-related files into it:

sudo mkdir -p /var/lib/kubernetes/
sudo cp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem \
  service-account-key.pem service-account.pem \
  encryption-config.yaml /var/lib/kubernetes/

Create a systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
  --advertise-address=${vmaddr} \\
  --allow-privileged=true \\
  --apiserver-count=3 \\
  --audit-log-maxage=30 \\
  --audit-log-maxbackup=3 \\
  --audit-log-maxsize=100 \\
  --audit-log-path=/var/log/audit.log \\
  --authorization-mode=Node,RBAC \\
  --bind-address=0.0.0.0 \\
  --client-ca-file=/var/lib/kubernetes/ca.pem \\
  --enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \\
  --etcd-cafile=/var/lib/kubernetes/ca.pem \\
  --etcd-certfile=/var/lib/kubernetes/kubernetes.pem \\
  --etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem \\
  --etcd-servers=https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379 \\
  --event-ttl=1h \\
  --encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml \\
  --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \\
  --kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \\
  --kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \\
  --runtime-config='api/all=true' \\
  --service-account-key-file=/var/lib/kubernetes/service-account.pem \\
  --service-account-signing-key-file=/var/lib/kubernetes/service-account-key.pem \\
  --service-account-issuer=https://192.168.1.21:6443 \\
  --service-cluster-ip-range=10.32.0.0/16 \\
  --service-node-port-range=30000-32767 \\
  --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
  --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Again, configuration options are not worth discussing in detail, but there are some interesting things to note:

  • the security related options (certs, etc.) simply reflect the assumptions made in the previous chapter.
  • the --service-cluster-ip-range specifies the range of IPs assigned to Kubernetes Services. These IPs will only be visible from within the cluster (i.e. pods).
  • the --service-node-port-range specifies the range of ports used for NodePort Services

Exhaustive option reference can be found here.

Enable and run it:

sudo systemctl daemon-reload
sudo systemctl enable kube-apiserver
sudo systemctl start kube-apiserver

You can verify if kube-apiserver is running correctly with systemctl status or by invoking its health-check API:

curl -v --cacert /var/lib/kubernetes/ca.pem https://127.0.0.1:6443/healthz

Note

curl may not be installed by default. You can install it manually with sudo apt install curl, but you can also make cloud-init do this automatically for you, as described previously.

Kubernetes API load balancer

The Kubernetes API server is now running, and we can try using it. Unfortunately, this would require referring to one of the control node IPs/addresses directly, rather than using a single, uniform IP and name for the entire API. We have configured all our kubeconfigs to use https://kubernetes:6443 as the API url. The name kubernetes is configured in the DNS server to resolve to a mysterious, unassigned address 192.168.1.21. This is a virtual IP, and it is now time to properly set it up.

What is a virtual IP?

A virtual IP address is an address within a local network that is not bound to a single machine but is rather recognized by multiple machines as their own. All the packets destined for the virtual IP must go through a load balancer (the gateway VM, in our case) which distributes them across machines that actually handle them.

Note

Only the incoming packets go through the load balancer, the returning packets go directly from destination to source.

This simple load balancing technique is implemented in the Linux kernel by the IPVS module, and has the advantage of not involving any address translation or tunnelling (although it can be configured to do so).

Virtual IP on control nodes

First, we need to make sure all the control nodes recognize the virtual IP 192.168.1.21 as their own. At first, this seems very easy to do: just assign this address statically to one of the network interfaces on the VM. For example, we could do something like this:

sudo ip addr add 192.168.1.21/32 dev enp0s1

However, we have a problem: an IP address conflict in the network. If anyone on the local network asks (via ARP) who has this address, all control nodes will respond. This is bad. We actually want only the load balancer machine to publicly admit the possession of this virtual IP. In order to make sure that control nodes never announce this IP as their own, we need to use some tricks:

First, assign the address on loopback interface rather than virtual ethernet:

sudo ip addr add 192.168.1.21/32 dev lo

This is not enough, though. By default, Linux considers all the addresses from all interfaces for ARP requests and responses. We need some more twiddling in kernel network options:

sudo sysctl net.ipv4.conf.all.arp_ignore=1
sudo sysctl net.ipv4.conf.all.arp_announce=2

Without going into too many details, the first option (arp_ignore) makes sure that the virtual IP never appears in ARP responses sent from control nodes, while the second option (arp_announce) ensures that it does not appear in ARP requests. For more details, see the Linux kernel documentation.

Note how these options are global - they are not bound to any specific IP or interface. They work for the virtual IP specifically because it is configured on a different interface (loopback) than the interface where all the ARP traffic happens (virtual ethernet).

Let's test this setup by pinging the virtual IP from the host machine:

$ ping 192.168.1.21
PING 192.168.1.21 (192.168.1.21): 56 data bytes
ping: sendto: Host is down
ping: sendto: Host is down
Request timeout for icmp_seq 0

If you see failures like the ones above, our setup worked.

Persisting the setup with cloud-init for control nodes

It would be nice for cloud-init to do all this setup for us. Otherwise, it will be lost upon every VM reboot.

In order to configure the virtual IP as a static one, we must use the network-config file for cloud-init. Edit the cloud-init/network-config.control template file that we have set up earlier and add the following content:

network:
  version: 2
  ethernets:
    lo:
      match:
        name: lo
      addresses: [192.168.1.21/32]
    eth:
      match:
        name: enp*
      dhcp4: true

Note

Even though we only want to modify the loopback interface, we must include a default entry for the virtual ethernet with DHCP enabled. Otherwise, it will not be configured.

Note

This is the same YAML format as the one used by Ubuntu's netplan utility.

In order to persist the ARP-related kernel options, add this to cloud-init/user-data.control:

write_files:
  - path: /etc/sysctl.d/50-vip-arp.conf
    content: |
      net.ipv4.conf.all.arp_announce = 2
      net.ipv4.conf.all.arp_ignore = 1
runcmd:
  - sysctl -p /etc/sysctl.d/50-vip-arp.conf

Setting up the load balancer machine

The control nodes are properly provisioned with the virtual IP, so now it's time to set up the load balancer itself.

First, make sure the following packages are installed on the gateway machine:

sudo apt install ipvsadm ldirectord

...or via cloud-init/user-data.gateway:

packages:
  - ipvsadm
  - ldirectord

Adding the virtual IP

Just like the control nodes, the gateway machine must recognize the virtual IP as its own. Unlike for control nodes, we want the gateway VM to publicly admit the ownership of this address with ARP. Therefore, there is no need to configure it on loopback (although it would work too) nor to change any kernel network options.

sudo ip addr add 192.168.1.21/32 dev <interface-name>

...or in cloud-init/network-config.gateway:

network:
  version: 2
  ethernets:
    eth:
      match:
        name: enp*
      addresses: [192.168.1.21/32]
      dhcp4: true

Playing with ipvsadm

ipvsadm is the utility that allows us to configure a load-balanced virtual IP within the Linux kernel. Ultimately, we won't be using it directly, and we'll allow this to be done by an userspace utility, ldirectord. However, just for educational purposes, let's try to do it by hand.

On the gateway machine, invoke:

sudo ipvsadm -A -t 192.168.1.21:6443 -s rr
sudo ipvsadm -a -t 192.168.1.21:6443 -r 192.168.1.11:6443 -g
sudo ipvsadm -a -t 192.168.1.21:6443 -r 192.168.1.12:6443 -g
sudo ipvsadm -a -t 192.168.1.21:6443 -r 192.168.1.13:6443 -g

The -s rr specifies load balancing strategy (round-robin) and the -g option indicates direct routing (i.e. no tunnelling or NAT).

You can now verify it using sudo ipvsadm -L:

$ sudo ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  gateway:6443 rr
  -> control0.kubevms:6443        Route   1      0          0
  -> control1.kubevms:6443        Route   1      0          0
  -> control2.kubevms:6443        Route   1      0          0

This should be enough for the load balancing to work. Let's try contacting the Kubernetes API via the virtual IP, from the host machine:

curl -v --cacert auth/ca.pem https://kubernetes:6443/healthz

You should get a successful (200 OK) response.

We can also try using kubectl for the first time to contact our nascent Kubernetes deployment:

kubectl get namespaces

You should see an output like this if everything works fine:

NAME              STATUS   AGE
default           Active   159m
kube-node-lease   Active   159m
kube-public       Active   159m
kube-system       Active   159m

Yay! This is the first time ever we have actually used the Kubernetes API!

Setting IPVS properly with ldirectord

Using ipvsadm directly works, but it has the following problems:

  • the configuration is not persistent, it will disappear after reboot
  • control nodes are not monitored, i.e. when a control node goes down, it will not be excluded from load balancing

The second problem is especially pressing and absolutely unacceptable if we want our deployment to be as close to a production one as possible. We need to make sure that when a control node goes down, the load balancer detects this and stops routing traffic to it.

Fortunately, there are many simple user-space tools that can do this for us. They use IPVS under the hood and additionally monitor target machines in userspace. If they detect that any of them is down, IPVS is dynamically reconfigured to exclude a faulty route.

The tool of our choice is ldirectord - an old and simple utility, but more than enough for our purposes. Instead of invoking ipvsadm manually, we define the load balanced service in a file:

cat <<EOF | sudo tee /etc/ha.d/ldirectord.cf
checktimeout=5
checkinterval=1
autoreload=yes
quiescent=yes

virtual=192.168.1.21:6443
    servicename=kubernetes
    real=192.168.1.11:6443 gate
    real=192.168.1.12:6443 gate
    real=192.168.1.13:6443 gate
    scheduler=wrr
    checktype=negotiate
    service=https
    request="healthz"
    receive="ok"
EOF

The three last lines of this configuration specify how target nodes are monitored: by issuing an HTTPS request on /healthz path and expecting an ok response.

There's one last problem: we are using HTTPS for health checks but this machine does not trust our Kubernetes API certificate, so health checks fail. Unfortunately, there is no way to configure a trusted CA within ldirectord configuration, so we have no choice but make it trusted in the whole system:

sudo cp ca.pem /usr/local/share/ca-certificates/kubernetes-ca.crt
sudo update-ca-certificates

We can also provision this certificate via cloud-init/user-data.gateway. Add the following section to it:

ca_certs:
  trusted:
    - |
$(sed "s/^/      /g" "$dir/auth/ca.pem")

Note

The ungodly sed incantation is responsible for adding indent to the contents of the ca.pem file being pasted, so that YAML's significant indentation rules are satisfied.

Make sure ldirectord is restarted after config changes:

sudo systemctl restart ldirectord

Then check sudo ipvsadm -L again. You should see something like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  gateway:6443 wrr
  -> control0.kubevms:6443        Route   1      0          0
  -> control1.kubevms:6443        Route   1      0          0
  -> control2.kubevms:6443        Route   1      0          0

The notable difference from the manual config is that now we are using the wrr strategy (weighted round-robin). Every target node has weight 1 assigned, meaning that they are treated equally. When ldirectord detects a node down, it sets its weight to 0. We can test this by stopping kube-apiserver on one of the control nodes, e.g. on control0:

sudo systemctl stop kube-apiserver

and you should see this reflected in the ipvsadm -L output:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  gateway:6443 wrr
  -> control0.kubevms:6443        Route   0      0          0
  -> control1.kubevms:6443        Route   1      0          0
  -> control2.kubevms:6443        Route   1      0          0

Great! This concludes the setup of the Kubernetes API server.

Installing the remaining control plane components

Let's go back to control nodes. We have two more things to install on them:

  • kube-controller-manager
  • kube-scheduler

Installing kube-controller-manager

Download the binary and install it in appropriate system dir:

wget -q --show-progress --https-only --timestamping \
  "https://storage.googleapis.com/kubernetes-release/release/v${k8s_version}/bin/linux/${arch}/kube-controller-manager"
chmod +x kube-controller-manager
sudo cp kube-controller-manager /usr/local/bin

Set up kube-controller-manager's kubeconfig:

sudo cp kube-controller-manager.kubeconfig /var/lib/kubernetes/

Create a systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kube-controller-manager.service
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \\
  --bind-address=0.0.0.0 \\
  --cluster-cidr=10.0.0.0/12 \\
  --cluster-name=kubernetes \\
  --cluster-signing-cert-file=/var/lib/kubernetes/ca.pem \\
  --cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem \\
  --kubeconfig=/var/lib/kubernetes/kube-controller-manager.kubeconfig \\
  --leader-elect=true \\
  --root-ca-file=/var/lib/kubernetes/ca.pem \\
  --service-account-private-key-file=/var/lib/kubernetes/service-account-key.pem \\
  --service-cluster-ip-range=10.32.0.0/16 \\
  --use-service-account-credentials=true \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Some things to note from the options:

  • As with all other components, security-related options reflect the assumptions made in the previous chapter
  • The --cluster-signing-cert-file and --cluster-signing-key-file are related to a feature that was not yet mentioned - an API to dynamically sign certificates
  • The --service-cluster-ip-range must be the same as in kube-apiserver
  • The --cluster-cidr specifies IP range for pods in the cluster. We will discuss this in more detail in the next chapter

Launch it:

sudo systemctl daemon-reload
sudo systemctl enable kube-controller-manager
sudo systemctl start kube-controller-manager

Installing kube-scheduler

Download the binary and install it in appropriate system dir:

wget -q --show-progress --https-only --timestamping \
  "https://storage.googleapis.com/kubernetes-release/release/v${k8s_version}/bin/linux/${arch}/kube-scheduler"
chmod +x kube-scheduler
sudo cp kube-scheduler /usr/local/bin

Set up kube-scheduler's configuration:

sudo cp kube-scheduler.kubeconfig /var/lib/kubernetes/
sudo mkdir -p /etc/kubernetes/config

cat <<EOF | sudo tee /etc/kubernetes/config/kube-scheduler.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: "/var/lib/kubernetes/kube-scheduler.kubeconfig"
leaderElection:
  leaderElect: true
EOF

Create a systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
  --config=/etc/kubernetes/config/kube-scheduler.yaml \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Launch it:

sudo systemctl daemon-reload
sudo systemctl enable kube-scheduler
sudo systemctl start kube-scheduler

Summary

In this chapter, we have:

  • installed all the control plane components of a proper Kubernetes deployment (except cloud-controller-manager)
  • set up an IPVS based load balancer for the Kubernetes API

At this point we have a fully functional Kubernetes API, but there aren't yet any worker nodes to schedule actual work.

Next: Spinning up Worker Nodes