checkpointer: should GC itself if installer no longer scheduled #253

aaronlevy · 2016-12-30T20:08:11Z

The checkpointer is deployed via a daemonset which "installs" a static manifest to the host.

If the checkpoint-installer is no longer scheduled to a node, the checkpointer should know how to GC itself (and all checkpoints).

yifan-gu · 2017-02-28T03:34:46Z

Today I have an idea about using the checkpointer to checkpoint itself so that we can get rid of the checkpointer installer.

Here is how it looks like in several scenarios:

Scenario A: checkpoint pod gets scheduled on node A as a daemonset

t0: Checkpointer (call it C1) gets running
t1: C1 checkpoints and activates itself by reading its own spec and saving to /etc/kubernetes/manifests
t2: The static pod version of checkpoint starts (call it C2), C2 will wait on a flock held by C1 to avoid race.
t3: C1 runs in a loop to keep creating checkpoints for other pods if necessary.

Scenario B: node A reboots, but cannot reach API server

t0: Kubelet starts
t1: The static version of checkpoint (C2) starts and grabs the flock
t2: C2 starts running the same loop as C1, and kicks off other checkpointed pods.

Scenario C: node A reached API server

t0: Daemonset version of checkpoint (C1) gets scheduled and started on node A
t1: C1 waits on a flock held by C2
t2: C2 keeps running [1]
t3: If C1 gets descheduled from node A, then C2 will find it out and remove all checkpoints

During [1], if checkpoint's spec is changed on the API server, then C2 will checkpoint the new spec, and gets restarted by kubelet. So the running and on-disk checkpoint specs are always the latest.

This is actually not very different from the checkpoint installer + checkpointer model. But by using only one image/pod, we can have the checkpoint to get its own pod spec from API server and get rid of the hidden manifest (see #206 )

The imperfect part is that users will see 2 checkpointer pods running on each master node, with one being active, and one being stand-by.

yifan-gu · 2017-02-28T03:37:31Z

/cc @aaronlevy @pbx0 @derekparker ^^

aaronlevy · 2017-02-28T19:40:38Z

I think this is a really interesting idea!

As far as the 2 checkpointer pods - really we already have this problem. There is a "checkpoint-installer" and "pod-checkpointer" on every node.

Another option to get rid of the 2 checkpointer pods might be to implement something like "exit on lock-contention" for the checkpointed copy. This is what we did for the self-hosted kubelet, such that if it saw anything attempting to acquire the lock, it would exit - allowing the copy sourced from the api-server (when available) to take over.

This more or less covers #206 - but we would still need to add (regardless of this change) some logic to the checkpointer to be aware that if it is to be GC'd it needs to clean up all other checkpoints before removing itself.

yifan-gu · 2017-03-01T20:32:00Z

Another option to get rid of the 2 checkpointer pods might be to implement something like "exit on lock-contention" for the checkpointed copy. This is what we did for the self-hosted kubelet, such that if it saw anything attempting to acquire the lock, it would exit - allowing the copy sourced from the api-server (when available) to take over.

Had a little bit more thought on this: This implies the static checkpointer will have a slightly different config than the daemonset checkpointer because it needs to exit on contention. Which means the the checkpointer needs to treat this as a special case when checkpointing itself (e.g. modify the command in the spec before writing to disk)

aaronlevy · 2017-03-06T19:14:04Z

Yeah, and I don't actually think exiting on lock contention is a valid solution - the checkpointer is special in that it always needs to be running a static copy (so it will come up on reboot without an api-server). So ignore my previous suggestion.

aaronlevy · 2017-04-03T16:45:19Z

@yifan-gu This is closed by #366 right?

aaronlevy · 2017-04-03T16:46:52Z

Yeah it does (I think it needs to have "fixes" for github to pick up). Closed in: #366

aaronlevy added kind/feature Categorizes issue or PR as related to a new feature. priority/P1 labels Feb 14, 2017

aaronlevy mentioned this issue Feb 24, 2017

Migrate bootkube checkpointer image. kubernetes/kubeadm#180

Closed

yifan-gu mentioned this issue Mar 11, 2017

Checkpoint and activate itself. #366

Merged

aaronlevy assigned yifan-gu Mar 16, 2017

aaronlevy closed this as completed Apr 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpointer: should GC itself if installer no longer scheduled #253

checkpointer: should GC itself if installer no longer scheduled #253

aaronlevy commented Dec 30, 2016

yifan-gu commented Feb 28, 2017 •

edited

Loading

yifan-gu commented Feb 28, 2017

aaronlevy commented Feb 28, 2017

yifan-gu commented Mar 1, 2017 •

edited

Loading

aaronlevy commented Mar 6, 2017

aaronlevy commented Apr 3, 2017 •

edited

Loading

aaronlevy commented Apr 3, 2017

checkpointer: should GC itself if installer no longer scheduled #253

checkpointer: should GC itself if installer no longer scheduled #253

Comments

aaronlevy commented Dec 30, 2016

yifan-gu commented Feb 28, 2017 • edited Loading

Scenario A: checkpoint pod gets scheduled on node A as a daemonset

Scenario B: node A reboots, but cannot reach API server

Scenario C: node A reached API server

yifan-gu commented Feb 28, 2017

aaronlevy commented Feb 28, 2017

yifan-gu commented Mar 1, 2017 • edited Loading

aaronlevy commented Mar 6, 2017

aaronlevy commented Apr 3, 2017 • edited Loading

aaronlevy commented Apr 3, 2017

yifan-gu commented Feb 28, 2017 •

edited

Loading

yifan-gu commented Mar 1, 2017 •

edited

Loading

aaronlevy commented Apr 3, 2017 •

edited

Loading