-
Notifications
You must be signed in to change notification settings - Fork 224
Add disaster recovery documentation. #584
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Disaster Recovery | ||
|
||
Self-hosted Kubernetes clusters are vulnerable to the following catastrophic | ||
failure scenarios: | ||
|
||
- Loss of all api-servers | ||
- Loss of all schedulers | ||
- Loss of all controller-managers | ||
- Loss of all self-hosted etcd nodes | ||
|
||
To minimize the likelihood of any of the these scenarios, production | ||
self-hosted clusters should always run in a [high-availability | ||
configuration](https://kubernetes.io/docs/admin/high-availability/). | ||
|
||
Nevertheless, in the event of a control plane loss the bootkube project | ||
provides limited disaster avoidance and recovery support through the | ||
`pod-checkpointer` program and the `bootkube recover` subcommand. | ||
|
||
## Pod Checkpointer | ||
|
||
The Pod Checkpointer is a program that ensures that existing local pod state | ||
can be recovered in the absence of an api-server. | ||
|
||
This is accomplished by managing "checkpoints" of local pod state as static pod | ||
manifests: | ||
|
||
- When the checkpointer sees that a "parent pod" (a pod which should be | ||
checkpointed), is successfully running, the checkpointer will save a local | ||
copy of the manifest. | ||
- If the parent pod is detected as no longer running, the checkpointer will | ||
"activate" the checkpoint manifest. It will allow the checkpoint to continue | ||
running until the parent-pod is restarted on the local node, or it is able to | ||
contact an api-server to determine that the parent pod is no longer scheduled | ||
to this node. | ||
|
||
A Pod Checkpointer DaemonSet is deployed by default when using `bootkube | ||
render` to create cluster manifests. Using the Pod Checkpointer is highly | ||
recommended for all self-hosted clusters to ensure node reboot resiliency. | ||
|
||
For more information, see the [Pod Checkpointer | ||
README](https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/checkpoint/README.md). | ||
|
||
## Bootkube Recover | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may want to have some kind of versioning convention. I'm assuming right now it's: you should always use the latest bootkube release when running recover. This may not be a confusion point, but I wonder if users will try and use the same bootkube release that they installed with (which is probably fine in most cases, unless there are new bug fixes they should have). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added note recommending to always use the latest version. |
||
|
||
In the event of partial or total self-hosted control plane loss, `bootkube | ||
recover` may be able to assist in re-bootstrapping the self-hosted control | ||
plane. | ||
|
||
The `bootkube recover` subcommand does not recover a cluster directly. Instead, | ||
it extracts the control plane configuration from an available source and | ||
renders manifests in a format that `bootkube start` can use invoked to reboot | ||
the cluster. | ||
|
||
To see available options, run: | ||
|
||
``` | ||
bootkube recover --help | ||
``` | ||
|
||
To recover a cluster, first invoke `bootkube recover` with flags corresponding | ||
to the current state of the cluster (supported states listed below). Then, | ||
invoke `bootkube start` to reboot the cluster. For example: | ||
|
||
``` | ||
scp bootkube user@master-node: | ||
ssh user@master-node | ||
./bootkube recover --asset-dir=recovered [scenario-specific options] | ||
sudo ./bootkube start --asset-dir=recovered | ||
``` | ||
|
||
For complete recovery examples see the | ||
[hack/multi-node/bootkube-test-recovery](https://github.com/kubernetes-incubator/bootkube/blob/master/hack/multi-node/bootkube-test-recovery) | ||
and | ||
[hack/multi-node/bootkube-test-recovery-self-hosted-etcd](https://github.com/kubernetes-incubator/bootkube/blob/master/hack/multi-node/bootkube-test-recovery-self-hosted-etcd) | ||
scripts. The `bootkube-test-recovery` script is demoed below. | ||
|
||
[![asciicast](https://asciinema.org/a/dsp43ziuuzwcztni94y8l25s5.png)](https://asciinema.org/a/dsp43ziuuzwcztni94y8l25s5) | ||
|
||
### If an api-server is still running | ||
|
||
If an api-server is still running but other control plane components are down, | ||
preventing cluster functionality (i.e. the scheduler pods are all down), the | ||
control plane can be extracted directly from the api-server: | ||
|
||
``` | ||
bootkube recover --asset-dir=recovered --kubeconfig=/etc/kubernetes/kubeconfig | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. one relevant issue: field engs suggest to rename --asset-dir to output-asset-dir. when they first tried without our help, they tried to pass in the old asset-dir in here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback! Created #589 |
||
``` | ||
### If an external etcd cluster is still running | ||
|
||
If using an external (non-self-hosted) etcd cluster, the control plane can be | ||
extracted directly from etcd: | ||
|
||
``` | ||
bootkube recover --asset-dir=recovered --etcd-servers=http://127.0.0.1:2379 --kubeconfig=/etc/kubernetes/kubeconfig | ||
``` | ||
|
||
### If an etcd backup is available (non-self-hosted etcd) | ||
|
||
First, recover the external etcd cluster from the backup. Then use the method | ||
described in the previous section to recover the control plane manifests. | ||
|
||
### If an etcd backup is available (self-hosted etcd) | ||
|
||
If using self-hosted etcd, recovery is supported via reading from an etcd | ||
backup file: | ||
|
||
``` | ||
bootkube recover --asset-dir=recovered --etcd-backup-file=backup --kubeconfig=/etc/kubernetes/kubeconfig | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we tried this with our field engs yesterday. there are a few things we need to make sure before running this script:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, do you want me to add this directly to the documentation? Also this is not really true of the other recovery situations. This makes it sound like you should basically destroy and recreate all your master nodes before using this recovery approach. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @diegs just fyi. we can address them later. |
||
In addition to rebooting the control plane, this will also destroy and recreate | ||
the self-hosted etcd cluster using the backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about linking to those docs -- as they're pretty different to how self-hosted HA works (which we need docs for: #311). It does touch on some important topics like leader-election, but even then we already have that and all we care about is scaling replica counts (for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added a TODO linking to the issue instead for now.