Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better etcd snapshot/restore in the kubeadm upgrade logic #618

Closed
luxas opened this issue Dec 25, 2017 · 7 comments
Closed

Better etcd snapshot/restore in the kubeadm upgrade logic #618

luxas opened this issue Dec 25, 2017 · 7 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@luxas
Copy link
Member

luxas commented Dec 25, 2017

Right now we exec cp -r, but we should take a snapshot before upgrading, and then actually be able to roll it back seamlessly if something fails. Now we do this on a cp manner using the filesystem, but we should do this via the etcd Go client instead to make it more robust. Might require some re-vendoring/structuring of the etcd Go client

cc @xiang90 @hongchaodeng @jamiehannaford @sbezverk @timothysc @xiangpengzhao @ericchiang

@luxas luxas added kind/enhancement priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Dec 25, 2017
@luxas luxas added this to the v1.10 milestone Dec 25, 2017
@timothysc
Copy link
Member

imo this is a documented process external to kubeadm.

@timothysc timothysc self-assigned this Jan 4, 2018
@jamiehannaford
Copy link
Contributor

I'm more of the opinion that reliable rollbacks are an essential component of upgrading etcd, so the user might benefit from having this handled on their behalf. Not fixed to this idea though. Maybe we can discuss in next implementation meeting?

@timothysc timothysc added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. triaged priority/needs-more-evidence and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 29, 2018
@timothysc timothysc removed their assignment Jan 29, 2018
@DGreenstein
Copy link

DGreenstein commented Feb 14, 2018

@timothysc - so what was the verdict on this? Is the intent to vendor in the etcd client and use that for making the etcd dump? How does this align with the demo from last week, where we saw the external etcd manager? Should this just be a documented manual effort until etcdmgr is ready? Do we want to proceed with enhancing the functionality of the kubeadm upgrade command?

FWIW: I agree with Jamie - if kubeadm upgrade is going to touch etcd at all, a first-class way of handling the etcd lifecycle in the context of an upgrade should be provided. High-level logic I think may be:

if self-hosted etcd

  • backup data
  • update deployment object(s)
  • restore data

if external

  • validate version

test some sane basic functionality like is cluster reporting healthy?

@timothysc
Copy link
Member

There already exists logic right now that will snap and restore, but it's pretty brute force atm.

@timothysc timothysc modified the milestones: v1.10, v1.11 Mar 5, 2018
@timothysc timothysc removed the triaged label Apr 7, 2018
@luxas luxas modified the milestones: v1.11, v1.12 May 14, 2018
@luxas
Copy link
Member Author

luxas commented May 15, 2018

We now at least have the possibility (when we upgrade godeps in v1.12) to do the snapshotting properly etcd-io/etcd#9118

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/enhancement labels Jun 5, 2018
@timothysc timothysc removed this from the v1.12 milestone Jul 3, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2018
@timothysc timothysc added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2018
@timothysc timothysc added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/needs-more-evidence labels Oct 26, 2018
@timothysc
Copy link
Member

I'm going to close this and wait to see the community asks or reopens. This seems to be a non-issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants