Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Add disaster recovery documentation. #584

Merged
merged 5 commits into from
Jun 19, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions Documentation/disaster-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Disaster Recovery

Self-hosted Kubernetes clusters are vulnerable to the following catastrophic
failure scenarios:

- Loss of all api-servers
- Loss of all schedulers
- Loss of all controller-managers
- Loss of all self-hosted etcd nodes

To minimize the likelihood of any of the these scenarios, production
self-hosted clusters should always run in a [high-availability
configuration](https://kubernetes.io/docs/admin/high-availability/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about linking to those docs -- as they're pretty different to how self-hosted HA works (which we need docs for: #311). It does touch on some important topics like leader-election, but even then we already have that and all we care about is scaling replica counts (for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added a TODO linking to the issue instead for now.


Nevertheless, in the event of a control plane loss the bootkube project
provides limited disaster avoidance and recovery support through the
`pod-checkpointer` program and the `bootkube recover` subcommand.

## Pod Checkpointer

The Pod Checkpointer is a program that ensures that existing local pod state
can be recovered in the absence of an api-server.

This is accomplished by managing "checkpoints" of local pod state as static pod
manifests:

- When the checkpointer sees that a "parent pod" (a pod which should be
checkpointed), is successfully running, the checkpointer will save a local
copy of the manifest.
- If the parent pod is detected as no longer running, the checkpointer will
"activate" the checkpoint manifest. It will allow the checkpoint to continue
running until the parent-pod is restarted on the local node, or it is able to
contact an api-server to determine that the parent pod is no longer scheduled
to this node.

A Pod Checkpointer DaemonSet is deployed by default when using `bootkube
render` to create cluster manifests. Using the Pod Checkpointer is highly
recommended for all self-hosted clusters to ensure node reboot resiliency.

For more information, see the [Pod Checkpointer
README](https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/checkpoint/README.md).

## Bootkube Recover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have some kind of versioning convention. I'm assuming right now it's: you should always use the latest bootkube release when running recover. This may not be a confusion point, but I wonder if users will try and use the same bootkube release that they installed with (which is probably fine in most cases, unless there are new bug fixes they should have).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note recommending to always use the latest version.


In the event of partial or total self-hosted control plane loss, `bootkube
recover` may be able to assist in re-bootstrapping the self-hosted control
plane.

The `bootkube recover` subcommand does not recover a cluster directly. Instead,
it extracts the control plane configuration from an available source and
renders manifests in a format that `bootkube start` can use invoked to reboot
the cluster.

To see available options, run:

```
bootkube recover --help
```

To recover a cluster, first invoke `bootkube recover` with flags corresponding
to the current state of the cluster (supported states listed below). Then,
invoke `bootkube start` to reboot the cluster. For example:

```
scp bootkube user@master-node:
ssh user@master-node
./bootkube recover --asset-dir=recovered [scenario-specific options]
sudo ./bootkube start --asset-dir=recovered
```

For complete recovery examples see the
[hack/multi-node/bootkube-test-recovery](https://github.com/kubernetes-incubator/bootkube/blob/master/hack/multi-node/bootkube-test-recovery)
and
[hack/multi-node/bootkube-test-recovery-self-hosted-etcd](https://github.com/kubernetes-incubator/bootkube/blob/master/hack/multi-node/bootkube-test-recovery-self-hosted-etcd)
scripts. The `bootkube-test-recovery` script is demoed below.

[![asciicast](https://asciinema.org/a/dsp43ziuuzwcztni94y8l25s5.png)](https://asciinema.org/a/dsp43ziuuzwcztni94y8l25s5)

### If an api-server is still running

If an api-server is still running but other control plane components are down,
preventing cluster functionality (i.e. the scheduler pods are all down), the
control plane can be extracted directly from the api-server:

```
bootkube recover --asset-dir=recovered --kubeconfig=/etc/kubernetes/kubeconfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one relevant issue: field engs suggest to rename --asset-dir to output-asset-dir. when they first tried without our help, they tried to pass in the old asset-dir in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Created #589

```
### If an external etcd cluster is still running

If using an external (non-self-hosted) etcd cluster, the control plane can be
extracted directly from etcd:

```
bootkube recover --asset-dir=recovered --etcd-servers=http://127.0.0.1:2379 --kubeconfig=/etc/kubernetes/kubeconfig
```

### If an etcd backup is available (non-self-hosted etcd)

First, recover the external etcd cluster from the backup. Then use the method
described in the previous section to recover the control plane manifests.

### If an etcd backup is available (self-hosted etcd)

If using self-hosted etcd, recovery is supported via reading from an etcd
backup file:

```
bootkube recover --asset-dir=recovered --etcd-backup-file=backup --kubeconfig=/etc/kubernetes/kubeconfig
```

Copy link
Contributor

@xiang90 xiang90 Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tried this with our field engs yesterday.

there are a few things we need to make sure before running this script:

  1. kubelet is running on the machine
  2. no related containers are running (old etcd, old api server, etc.. this is also applied to other recovery cases i believe)
  3. docker state is clean (docker ps -a does not contain old states of relevant containers). kubelet has bugs that it might incorrectly believes the static pod has dead when old state exists.
  4. /var/etcd dir is clean on ALL master nodes

Copy link
Contributor Author

@diegs diegs Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, do you want me to add this directly to the documentation?

Also this is not really true of the other recovery situations. This makes it sound like you should basically destroy and recreate all your master nodes before using this recovery approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegs just fyi. we can address them later.

In addition to rebooting the control plane, this will also destroy and recreate
the self-hosted etcd cluster using the backup.
26 changes: 1 addition & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,31 +60,7 @@ bootkube start --asset-dir=my-cluster

In the case of a partial or total control plane outage (i.e. due to lost master nodes) an experimental `recover` command can extract and write manifests from a backup location. These manifests can then be used by the `start` command to reboot the cluster. Currently recovery from a running apiserver, an external running etcd cluster, or an etcd backup taken from the self hosted etcd cluster are the methods.

To see available options, run:

```
bootkube recover --help
```

Recover from an external running etcd cluster:

```
bootkube recover --asset-dir=recovered --etcd-servers=http://127.0.0.1:2379 --kubeconfig=/etc/kubernetes/kubeconfig
```

Recover from a running apiserver (i.e. if the scheduler pods are all down):

```
bootkube recover --asset-dir=recovered --kubeconfig=/etc/kubernetes/kubeconfig
```

Recover from an etcd backup when self hosted etcd is enabled:

```
bootkube recover --asset-dir=recovered --etcd-backup-file=backup --kubeconfig=/etc/kubernetes/kubeconfig
```

For a complete recovery example please see the [hack/multi-node/bootkube-test-recovery](hack/multi-node/bootkube-test-recovery) and the [hack/multi-node/bootkube-test-recovery-self-hosted-etcd](hack/multi-node/bootkube-test-recovery-self-hosted-etcd) scripts.
For more details and examples see [disaster recovery documentation](Documentation/disaster-recovery.md).

## Building

Expand Down