Skip to content

oneke_ops

Pedro Ielpi edited this page Oct 3, 2024 · 8 revisions

Operating OneKE

Accessing the Kubernetes Cluster

The leader VNF node runs an HAProxy instance that by default exposes Kubernetes API port 6443 on the public VIP address over the HTTPS protocol, secured with two-way SSL/TLS certificates.

This HAProxy instance can be used in two ways:

  • As a stable Control Plane endpoint for the whole Kubernetes cluster.
  • As an external Kubernetes API endpoint that can be reached from outside the internal VNET.
graph LR;
    internet --- vnf;
    vnf --- master & worker & storage;
    internet((Internet));
    style vnf text-align:left
    style master text-align:left
    style worker text-align:left
    style storage text-align:left
    vnf[["vnf (NAT 🔀)"<br>haproxy - *:6443<br><hr>eth0:10.2.11.86<br><hr>eth1:172.20.0.86]];
    master[master<br>kube-apiserver - *:6443<br><hr>eth0:172.20.0.101<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
    worker[worker<br><hr>eth0:172.20.0.102<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
    storage[storage<br><hr>eth0:172.20.0.103<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
Loading

To access the Kubernetes API you'll need a kubeconfig file. In the case of RKE2, you can copy the /etc/rancher/rke2/rke2.yaml file located on every master node. For example:

$ install -d ~/.kube/
$ scp -J [email protected] [email protected]:/etc/rancher/rke2/rke2.yaml ~/.kube/config
Warning: Permanently added '10.2.11.86' (ED25519) to the list of known hosts.
Warning: Permanently added '172.20.0.101' (ED25519) to the list of known hosts.
rke2.yaml

Additionally you must adjust the Control Plane endpoint in the file to point to the public VIP:

$ gawk -i inplace -f- ~/.kube/config <<'EOF'
/^    server: / { $0 = "    server: https://10.2.11.86:6443" }
{ print }
EOF

Since OneKE 1.29, it's also possible to extract the kubeconfig file from the user template of any master VMs in the master role. For example:

onevm show 'master_0_(service_1)' --json | jq -r '.VM.USER_TEMPLATE.ONEKE_KUBECONFIG|@base64d' | install -m u=rw,go= -D /dev/fd/0 ~/.kube/config

And then your local kubectl command should work just fine:

$ kubectl get nodes
NAME                    STATUS   ROLES                       AGE    VERSION
oneke-ip-172-20-0-101   Ready    control-plane,etcd,master   33m    v1.29.4+rke2r1
oneke-ip-172-20-0-102   Ready    <none>                      28m    v1.29.4+rke2r1
oneke-ip-172-20-0-103   Ready    <none>                      28m    v1.29.4+rke2r1
oneke-ip-172-20-0-104   Ready    control-plane,etcd,master   12m    v1.29.4+rke2r1
oneke-ip-172-20-0-105   Ready    control-plane,etcd,master   10m    v1.29.4+rke2r1

Important

If you'd like to use a custom domain name for the Control Plane endpoint instead of the direct public VIP address, you need to add the domain to the ONEAPP_K8S_EXTRA_SANS context parameter, for example localhost,127.0.0.1,k8s.yourdomain.it, and set the domain inside the ~/.kube/config file as well. You can set up your domain in a public/private DNS server or in your local /etc/hosts file.

Accessing the K8s API via SSH Tunnels

By default Kubernetes API Server's extra SANs are set to localhost,127.0.0.1 which allows you to access Kubernetes API via SSH tunnels.

Note

We recommend using the ProxyCommand SSH feature.

Download the /etc/rancher/rke2/rke2.yaml kubeconfig file:

$ install -d ~/.kube/
$ scp -o ProxyCommand='ssh -A [email protected] -W %h:%p' [email protected]:/etc/rancher/rke2/rke2.yaml ~/.kube/config

Note

The 10.2.11.86 is the public VIP address, 172.20.0.101 is a private address of a master node inside the private VNET.

Create SSH tunnel, forward TCP port 6443:

$ ssh -o ProxyCommand='ssh -A [email protected] -W %h:%p' -L 6443:localhost:6443 [email protected]

and then run kubectl in another terminal:

$ kubectl get nodes
NAME                    STATUS   ROLES                       AGE    VERSION
oneke-ip-172-20-0-101   Ready    control-plane,etcd,master   58m    v1.29.4+rke2r1
oneke-ip-172-20-0-102   Ready    <none>                      52m    v1.29.4+rke2r1
oneke-ip-172-20-0-103   Ready    <none>                      52m    v1.29.4+rke2r1
oneke-ip-172-20-0-104   Ready    control-plane,etcd,master   31m    v1.29.4+rke2r1
oneke-ip-172-20-0-105   Ready    control-plane,etcd,master   29m    v1.29.4+rke2r1

Usage Examples

Create a Longhorn Persistent Volume Claim (PVC)

To create a 4 GiB persistent volume apply the following manifest using kubectl:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 4Gi
  storageClassName: longhorn-retain
$ kubectl apply -f nginx-pvc.yaml
persistentvolumeclaim/nginx created
$ kubectl get pvc,pv
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/nginx   Bound    pvc-5b0f9618-b840-4544-bccc-6479c83b49d3   4Gi        RWO            longhorn-retain   78s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM           STORAGECLASS      REASON   AGE
persistentvolume/pvc-5b0f9618-b840-4544-bccc-6479c83b49d3   4Gi        RWO            Retain           Bound    default/nginx   longhorn-retain            76s

Important

The Retain reclaim policy may protect your persistent data from accidental removal. Always back up your data!

Create an NGINX Deployment

To deploy an NGINX instance using the PVC created previously, apply the following manifest using kubectl:

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: http
        image: nginx:alpine
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 80
        volumeMounts:
        - mountPath: "/persistent/"
          name: nginx
      volumes:
      - name: nginx
        persistentVolumeClaim:
          claimName: nginx
$ kubectl apply -f nginx-deployment.yaml
deployment.apps/nginx created
$ kubectl get deployments,pods
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           32s

NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6b5d47679b-sjd9p   1/1     Running   0          32s

Create a Traefik IngressRoute

To expose the running NGINX instance over HTTP, on port 80 of the public VNF VIP address, apply the following manifest using kubectl:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  type: ClusterIP
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
---
# In Traefik < 3.0.0 it used to be "apiVersion: traefik.containo.us/v1alpha1".
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: nginx
spec:
  entryPoints: [web]
  routes:
    - kind: Rule
      match: Path(`/`)
      services:
        - kind: Service
          name: nginx
          port: 80
          scheme: http
$ kubectl apply -f nginx-svc-ingressroute.yaml
service/nginx created
ingressroute.traefik.containo.us/nginx created
$ kubectl get svc,ingressroute
NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.43.0.1     <none>        443/TCP   3h18m
service/nginx        ClusterIP   10.43.99.36   <none>        80/TCP    63s

NAME                                     AGE
ingressroute.traefik.containo.us/nginx   63s

Verify that the new IngressRoute CRD (Custom Resource Definition) object is operational:

$ curl -fsSL http://10.2.11.86/ | grep title
<title>Welcome to nginx!</title>

Create a MetalLB LoadBalancer Service

To expose the running NGINX instance over HTTP on the port 80 using a private LoadBalancer service provided by MetalLB, apply the following manifest using kubectl:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-lb
spec:
  selector:
    app: nginx
  type: LoadBalancer
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
$ kubectl apply -f nginx-loadbalancer.yaml
service/nginx-lb created
$ kubectl get svc
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP      10.43.0.1       <none>        443/TCP        3h25m
nginx        ClusterIP      10.43.99.36     <none>        80/TCP         8m50s
nginx-lb     LoadBalancer   10.43.222.235   172.20.0.87   80:30050/TCP   73s

Verify that the new LoadBalancer service is operational:

$ curl -fsSL http://172.20.0.87/ | grep title
<title>Welcome to nginx!</title>

Upgrading

Important

When upgrading, enabling access to the public internet is recommended, since RKE2 will need to download various Docker images to complete the upgrade.

K8s clusters can be upgraded with the System Upgrade Controller provided by RKE2. Here's a handy bash snippet to illustrate the procedure:

#!/usr/bin/env bash

: "${SUC_VERSION:=0.13.4}"
: "${RKE2_VERSION:=v1.29.4+rke2r1}"

set -o errexit -o nounset

# Deploy CRDs.
kubectl apply -f "https://github.com/rancher/system-upgrade-controller/releases/download/v${SUC_VERSION}/crd.yaml"

# Deploy the System Upgrade Controller.
kubectl apply -f "https://github.com/rancher/system-upgrade-controller/releases/download/v${SUC_VERSION}/system-upgrade-controller.yaml"

# Wait for required Custom Resource Definitions to appear.
for RETRY in 9 8 7 6 5 4 3 2 1 0; do
  if kubectl get crd/plans.upgrade.cattle.io --no-headers; then break; fi
  sleep 5
done && [[ "$RETRY" -gt 0 ]]

# Plan the upgrade.
kubectl apply -f- <<EOF
---
# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
       - {key: rke2-upgrade, operator: Exists}
       - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
       # When using k8s version 1.19 or older, swap control-plane with master
       - {key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: "$RKE2_VERSION"
---
# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: agent
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
      # When using k8s version 1.19 or older, swap control-plane with master
      - {key: node-role.kubernetes.io/control-plane, operator: NotIn, values: ["true"]}
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/rke2-upgrade
  serviceAccountName: system-upgrade
  tolerations:
    - key: node.longhorn.io/create-default-disk
      value: "true"
      operator: Equal
      effect: NoSchedule
  cordon: true
  drain:
    force: true
    ignoreDaemonSets: true
    timeout: 0
  upgrade:
    image: rancher/rke2-upgrade
  version: "$RKE2_VERSION"
EOF

# Enable/Start the upgrade process on all cluster nodes.
kubectl label nodes --all rke2-upgrade=true

Component Upgrade (INCOMPLETE!)

By default OneKE deploys Longhorn, Traefik, and MetalLB during cluster bootstrap. All these apps are deployed as Addons using RKE2's Helm Integration and official Helm charts. To illustrate the process, let's upgrade Traefik Helm chart from version 23.0.0 to 28.0.0, following four basic steps.

Important

When upgrading, enabling access to the public internet is recommended, since RKE2 will need to download various Docker images to complete the upgrade.

  1. To avoid downtime, ensure that the number of worker nodes is at least 2, so 2 (anti-affined) Traefik replicas are running.
$ oneflow scale 'Service OneKE 1.29' worker 2
$ oneflow show 'Service OneKE 1.29'
...
LOG MESSAGES
05/13/24 13:32 [I] New state: DEPLOYING_NETS
05/13/24 13:32 [I] New state: DEPLOYING
05/13/24 13:39 [I] New state: RUNNING
05/13/24 13:54 [I] Role worker scaling up from 1 to 2 nodes
05/13/24 13:54 [I] New state: SCALING
05/13/24 13:56 [I] New state: COOLDOWN
05/13/24 13:01 [I] New state: RUNNING
$ kubectl -n traefik-system get pods
NAME                           READY   STATUS    RESTARTS   AGE
one-traefik-6768f7bdf4-cvqn2   1/1     Running   0          23m
one-traefik-6768f7bdf4-qqfcl   1/1     Running   0          23m
$ kubectl -n traefik-system get pods -o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\n"}{end}'
traefik:2.7.1
traefik:2.7.1
  1. To enable downloading Traefik Helm chartes, update the Helm repositories.
$ helm repo add traefik https://helm.traefik.io/traefik
"traefik" has been added to your repositories

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "traefik" chart repository
Update Complete. ⎈Happy Helming!
  1. Patch the HelmChart/one-traefik CRD object.
#/usr/bin/env bash

set -eo pipefail

helm pull traefik/traefik --version '28.0.0'

if ! test -f /opt/one-appliance/addons/one-traefik-backup.yaml; then
    cat /opt/one-appliance/addons/one-traefik.yaml | tee /opt/one-appliance/addons/one-traefik-backup.yaml
fi

install -m u=rw,go= -D /dev/fd/0 /opt/one-appliance/addons/kustomization.yaml <<EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - one-traefik-backup.yaml
patches:
  - target:
      kind: HelmChart
      group: helm.cattle.io
      version: v1
      name: one-traefik
    patch: |
      - op: replace
        path: /spec/chartContent
        value: >-
          $(base64 -w0 < ./traefik-28.0.0.tgz)
EOF

kubectl kustomize /opt/one-appliance/addons/ | tee /opt/one-appliance/addons/one-traefik.yaml
  1. Verify that Traefik's pods have been recreated.
$ kubectl -n traefik-system get pods
NAME                           READY   STATUS    RESTARTS   AGE
one-traefik-7c5875d657-9v5h2   1/1     Running   0          88s
one-traefik-7c5875d657-bsp4v   1/1     Running   0          88s
$ kubectl -n traefik-system get pods -o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\n"}{end}'
docker.io/traefik:v3.0.0
docker.io/traefik:v3.0.0

Warning

Since Treafik 3.0.0 the apiVersion: traefik.containo.us/v1alpha1 field in CRD objects must be replaced with apiVersion: traefik.io/v1alpha1. Please update/patch all your Traefik-specific CRD objects.

Important

This example was a very simple and quick Helm chart upgrade, but in general config changes in the spec.valuesContent field may also be required. Please plan your upgrades ahead!

Clone this wiki locally