Better etcd snapshot/restore in the `kubeadm upgrade` logic #618

luxas · 2017-12-25T18:17:58Z

Right now we exec cp -r, but we should take a snapshot before upgrading, and then actually be able to roll it back seamlessly if something fails. Now we do this on a cp manner using the filesystem, but we should do this via the etcd Go client instead to make it more robust. Might require some re-vendoring/structuring of the etcd Go client

cc @xiang90 @hongchaodeng @jamiehannaford @sbezverk @timothysc @xiangpengzhao @ericchiang

The text was updated successfully, but these errors were encountered:

timothysc · 2018-01-04T17:53:03Z

imo this is a documented process external to kubeadm.

jamiehannaford · 2018-01-08T04:14:18Z

I'm more of the opinion that reliable rollbacks are an essential component of upgrading etcd, so the user might benefit from having this handled on their behalf. Not fixed to this idea though. Maybe we can discuss in next implementation meeting?

DGreenstein · 2018-02-14T16:20:40Z

@timothysc - so what was the verdict on this? Is the intent to vendor in the etcd client and use that for making the etcd dump? How does this align with the demo from last week, where we saw the external etcd manager? Should this just be a documented manual effort until etcdmgr is ready? Do we want to proceed with enhancing the functionality of the kubeadm upgrade command?

FWIW: I agree with Jamie - if kubeadm upgrade is going to touch etcd at all, a first-class way of handling the etcd lifecycle in the context of an upgrade should be provided. High-level logic I think may be:

if self-hosted etcd

backup data
update deployment object(s)
restore data

if external

validate version

test some sane basic functionality like is cluster reporting healthy?

timothysc · 2018-03-05T01:26:20Z

There already exists logic right now that will snap and restore, but it's pretty brute force atm.

luxas · 2018-05-15T08:16:06Z

We now at least have the possibility (when we upgrade godeps in v1.12) to do the snapshotting properly etcd-io/etcd#9118

fejta-bot · 2018-10-01T20:15:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

timothysc · 2019-01-04T21:18:27Z

I'm going to close this and wait to see the community asks or reopens. This seems to be a non-issue.

luxas added kind/enhancement priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Dec 25, 2017

luxas added this to the v1.10 milestone Dec 25, 2017

timothysc self-assigned this Jan 4, 2018

timothysc removed their assignment Jan 29, 2018

timothysc modified the milestones: v1.10, v1.11 Mar 5, 2018

timothysc removed the triaged label Apr 7, 2018

luxas modified the milestones: v1.11, v1.12 May 14, 2018

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/enhancement labels Jun 5, 2018

timothysc removed this from the v1.12 milestone Jul 3, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2018

timothysc added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2018

timothysc added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/needs-more-evidence labels Oct 26, 2018

timothysc closed this as completed Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better etcd snapshot/restore in the `kubeadm upgrade` logic #618

Better etcd snapshot/restore in the `kubeadm upgrade` logic #618

luxas commented Dec 25, 2017

timothysc commented Jan 4, 2018

jamiehannaford commented Jan 8, 2018

DGreenstein commented Feb 14, 2018 •

edited

Loading

timothysc commented Mar 5, 2018

luxas commented May 15, 2018

fejta-bot commented Oct 1, 2018

timothysc commented Jan 4, 2019

Better etcd snapshot/restore in the kubeadm upgrade logic #618

Better etcd snapshot/restore in the kubeadm upgrade logic #618

Comments

luxas commented Dec 25, 2017

timothysc commented Jan 4, 2018

jamiehannaford commented Jan 8, 2018

DGreenstein commented Feb 14, 2018 • edited Loading

timothysc commented Mar 5, 2018

luxas commented May 15, 2018

fejta-bot commented Oct 1, 2018

timothysc commented Jan 4, 2019

Better etcd snapshot/restore in the `kubeadm upgrade` logic #618

Better etcd snapshot/restore in the `kubeadm upgrade` logic #618

DGreenstein commented Feb 14, 2018 •

edited

Loading