checkpoint
is a utility application which will manage "checkpoints" of pods scheduled to the local node.
The purpose of a pod checkpoint is to ensure that existing local pod state can be recovered in the absence of an api-server.
The kubelet will already attempt to ensure that local pods will continue to run in the absence of an API server (based on restartPolicy). However, if all local runtime state has been lost (for example after a reboot), the checkpoint utility will ensure the local state can be recovered until an api-server is contacted.
This is accomplished by managing checkpoints as static pod manifests:
-
When the checkpointer sees that a "parent pod" (a pod which should be checkpointed), is successfully running, the checkpointer will save a local copy of the manifest.
-
If the parent pod is detected as no longer running, the checkpointer will "activate" the checkpoint manifest. It will allow the checkpoint to continue running until the parent-pod is restarted on the local node, or it is able to contact an api-server to determine that the parent pod is no longer scheduled to this node.
Any pod which contains the checkpointer.alpha.coreos.com/checkpoint=true
annotation will be considered a viable "parent pod" which should be checkpointed.
The parent pod cannot itself be a static pod, and is not a checkpoint itself. Affinity is not supported for a pod, and any pod labelled with the checkpoint annotation will be checkpointed.
Checkpoints are denoted by the checkpointer.alpha.coreos.com/checkpoint-of
annotation. This annotation will point to the parent of this checkpoint by pod name.
For example the pod:
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
annotations:
checkpointer.alpha.coreos.com/checkpoint=true
Will generate a checkpointed pod as:
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
annotations:
checkpointer.alpha.coreos.com/checkpoint-of=kube-apiserver
- Inactive checkpoint manifests: /etc/kubernetes/inactive-manifests
- Active checkpoint manifests: /etc/kubernetes/manifests
- Checkpointed secrets: /etc/kubernetes/checkpoint-secrets
- Config Maps: /etc/kubernetes/checkpoint-configmaps
Parts of the pod manifest will be scrubbed prior to being saved as checkpoints. This is to ensure that the pod does not interfere with the parent object, and is managed in isolation.
- All labels and non-checkpoint related annotations will be removed
- Service account details are removed
- Secrets are downloaded from the apiserver and converted to hostMounts
- ConfigMaps are downloaded from the apiserver and converted to hostMounts
- Pod status is cleared
Secrets are stored using a path of:
/etc/kubernetes/checkpoint-secrets/<namespace>/<pod-name>/<secret-name>
ConfigMaps are stored using a path of:
/etc/kubernetes/checkpoint-configmaps/<namespace>/<pod-name>/<configmap-name>
The pod checkpoint will also checkpoint itself to the disk to handle the absence of the API server. After a node reboot, the on-disk pod-checkpointer will take over the responsibility.
If the pod checkpointer reaches the API server and finds out that it's no longer being scheduled, it will remove all on-disk checkpoints before cleaning itself up.
By default, the pod checkpoint runs with service account credentials, checkpointing its own service account secret for reboots. That service account must be bound to a Role that lets the pod checkpoint watch for Pods with the checkpoint annotation, then save ConfigMaps and Secrets referenced by those Pods.
kind: Role
metadata:
name: pod-checkpointer
namespace: kube-system
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
- apiGroups: [""] # "" indicates the core API group
resources: ["secrets", "configmaps"]
verbs: ["get"]